+ - 0:00:00
Notes for current slide
Notes for next slide

GATC Logo

Reference Genomes in Galaxy

Slides: @blankenberg, @Slugger70

#usegalaxy #GAT2017 / @galaxyproject

1 / 26

GATC Logo

Please interrupt

We are here to answer questions!

#usegalaxy #GAT2017 / @galaxyproject

2 / 26

GATC Logo Overview

  • Intro to built in datasets
  • Built in data hierarchy
  • Some problems
  • Data Managers

#usegalaxy #GAT2017 / @galaxyproject

3 / 26

GATC Logo Built in Data

List_of_data.png

#usegalaxy #GAT2017 / @galaxyproject

4 / 26

GATC Logo Data, what data?

  • Some genomes are large! Human, Mouse, Coral
  • Some tools require indices of the genomes.
  • The indices take a long time to build!
  • Better to pre-build the indices.

#usegalaxy #GAT2017 / @galaxyproject

5 / 26

GATC Logo Overview

  • Intro to built in datasets
  • Built in data hierarchy
  • Some problems
  • Data Managers
6 / 26

GATC Logo Data schematics in Galaxy

schematic

#usegalaxy #GAT2017 / @galaxyproject

7 / 26

GATC Logo Using reference data in a tool

bwa.xml

<conditional name="reference_source">
<param name="reference_source_selector" type="select" label="Will you select a reference genome from your history or use a built-in index?" help="Built-ins were indexed using default options. See 'Indexes' section of help below">
<option value="cached">Use a built-in genome index</option>
<option value="history">Use a genome from history and build index</option>
</param>
<when value="cached">
<param name="ref_file" type="select" label="Using reference genome" help="Select genome from the list">
<options from_data_table="bwa_mem_indexes">
<filter type="sort_by" column="2" />
<validator type="no_options" message="No indexes are available" />
</options>
<validator type="no_options" message="A built-in reference genome is not available for the build associated with the selected input file"/>
</param>
</when>
<when value="history">

#usegalaxy #GAT2017 / @galaxyproject

8 / 26

GATC Logo Where are the data tables?

tool_data_table_conf.xml

(Usually located in galaxy/config/)

<tables>
<!-- Locations of indexes in the BWA mapper format -->
<table name="bwa_indexes" comment_char="#" allow_duplicate_entries="False">
<columns>value, dbkey, name, path</columns>
<file path="tool-data/bwa_index.loc" />
</table>
</tables>

#usegalaxy #GAT2017 / @galaxyproject

9 / 26

GATC Logo "loc" files - Short for location!

Not "sending me loco"

bwa_index.loc

...
#
#<unique_build_id> <dbkey> <display_name> <file_path>
#
...
bosTau7 bosTau7 Cow (bosTau7) /mnt/galaxyIndices/genomes/bosTau7/bwa_mem_index/bosTau7/bosTau7.fa
ce10 ce10 C. elegans (ce10) /mnt/galaxyIndices/genomes/ce10/bwa_mem_index/ce10/ce10.fa
danRer7 danRer7 Zebrafish (danRer7) /mnt/galaxyIndices/genomes/danRer7/bwa_mem_index/danRer7/danRer7.fa
dm3 dm3 D. melanogaster Apr. 2006 (BDGP R5/dm3) (dm3) /mnt/galaxyIndices/genomes/dm3/bwa_mem_index/dm3/dm3.fa
hg19 hg19 Human (hg19) /mnt/galaxyIndices/genomes/hg19/bwa_mem_index/hg19/hg19.fa
hg38 hg38 Human (hg38) /mnt/galaxyIndices/genomes/hg38/bwa_mem_index/hg38/hg38.fa
mm10 mm10 Mouse (mm10) /mnt/galaxyIndices/genomes/mm10/bwa_mem_index/mm10/mm10.fa
...

#usegalaxy #GAT2017 / @galaxyproject

10 / 26

GATC Logo Overview

  • Intro to built in datasets
  • Built in data hierarchy
  • Some problems
  • Data Managers
11 / 26

GATC Logo Some Problems!

  • Time consuming!

    • ~30 minutes work just to add a new genome to 1 tool!
  • Administrator needs to know:

    • how to index every tool
    • expected format of the reference data
    • format of the .loc file
#usegalaxy #GAT2017 / @galaxyproject
12 / 26

GATC Logo Typical conversation

ref-problem-1.png

#usegalaxy #GAT2017 / @galaxyproject

13 / 26

GATC Logo Typical conversation

ref-problem-2.png

#usegalaxy #GAT2017 / @galaxyproject

14 / 26

GATC Logo Typical conversation

ref-problem-3.png

#usegalaxy #GAT2017 / @galaxyproject

15 / 26

GATC Logo Typical conversation

ref-problem-4.png

#usegalaxy #GAT2017 / @galaxyproject

16 / 26

GATC Logo Other concerns

  • Accessible?
    • Manually download genome FASTA files
    • Download, compile, run bwa index; which options?
  • Reproducible?
    • Only if the person performing manual steps keeps good notes
  • Transparent?
    • Send email to sysadmin asking for notes
    • Restart Galaxy server for new entries

#usegalaxy #GAT2017 / @galaxyproject

17 / 26

GATC Logo Overview

  • Intro to built in datasets
  • Built in data hierarchy
  • Some problems
  • Data Managers
(now we're onto the good stuff!)
18 / 26

GATC Logo Data Managers

  • Allows for the creation of built-in (reference) data

    • underlying data
    • data tables
    • *.loc files
  • Specialized Galaxy tools that can only be accessed by an admin

  • Defined locally or installed from ToolShed

#usegalaxy #GAT2017 / @galaxyproject
19 / 26

GATC Logo Data Managers

  • Flexible framework
    • Not just genomic data
    • Run Data Managers through UI
    • Workflow compatible
    • API
  • Examples
    • Adding new genome builds (dbkeys)
    • Fetching genome (fasta) sequences
    • Building short read mapper indices for genomes
#usegalaxy #GAT2017 / @galaxyproject
20 / 26

GATC Logo Special class of Galaxy tool

Looks just like a normal Galaxy tool!

Data-manager-ui.png

#usegalaxy #GAT2017 / @galaxyproject

21 / 26

GATC Logo What does it do?

The output of the data manager is a JSON description of the new data table entry

data_table_JSON.png

This gets turned into a new data table entry

data_table_entry.png

The index files themselves get placed in the appropriate location.

#usegalaxy #GAT2017 / @galaxyproject

22 / 26

GATC Logo Data Managers Admin

  • Located on the Galaxy's Admin Tab under Local Data
data_managers_tool_list.png

#usegalaxy #GAT2017 / @galaxyproject

23 / 26

GATC Logo Data Managers Admin

  • UI tools to fetch reference genomes/build indices
  • View progress of index build jobs
  • View contents of tool data tables
data_table_ui.png

#usegalaxy #GAT2017 / @galaxyproject

24 / 26

GATC Logo Resources / further reading

  • Galaxy Wiki Page on Data Managers
    • Details
    • Building
    • Examples

https://wiki.galaxyproject.org/Admin/Tools/DataManagers

#usegalaxy #GAT2017 / @galaxyproject
25 / 26

GATC Logo Exercise Time!

#usegalaxy #GAT2017 / @galaxyproject

26 / 26

GATC Logo

Please interrupt

We are here to answer questions!

#usegalaxy #GAT2017 / @galaxyproject

2 / 26
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow