Question: Reference genome download - a step-by-step please?
gravatar for Nicholas.Blackburn
3.5 years ago by
Nicholas.Blackburn140 wrote:


I know this has been addressed in various posts at different times. However after extensive review of these posts and the galaxy wiki and spending several hours on this I am still no closer to my goal. I'm running a local instance of galaxy.

I need a step-by-step instruction on how to obtain reference genomes for the various tool shed tools. As admin, I've been able to download through 'Manage local data (beta)' the hg19 reference genome, which did this:

python /home/gxfinal/shed_tools/ "/home/gxfinal/galaxy/database/files/000/dataset_322.dat" --dbkey_description 'Human Feb. 2009 (GRCh37/hg19) (hg19)'

I believe this has to be indexed by the various tools before it will appear in tool drop down menus. If this need to be done on the command line, that's fine I can do that but I need to know where this needs to be done, and what files should be present for a reference genome to appear in tool options. 

A feel a step-by-step detailed guide would be invaluable for other users with the same problems. I feel that given Galaxy's easy interface it should be simpler than I'm making it out to be. 

Could someone please help? Or point me towards what I need. 

Kind regards,



ADD COMMENTlink modified 3.5 years ago by Jennifer Hillman Jackson25k • written 3.5 years ago by Nicholas.Blackburn140

Just to say, I have the very same question, I can't manage to let my reference genome to show up in the drop-down menus

ADD REPLYlink written 3.5 years ago by marcelo.navarrete0
gravatar for Jennifer Hillman Jackson
3.5 years ago by
United States
Jennifer Hillman Jackson25k wrote:

Hi all,

Bjoern's advice and link are a good place to start to better understand installing and indexing reference genomes.

That same wiki also links out to others that describe the line-command methods in detail. But, you won't need those instructions unless a Data Manager does not exist for the target tool and you do not wish to write one.

Is the question about the order to use the Data Manager tools? Most can be run in any order, with a few exceptions.

  1. Start off in a local Galaxy with an admin account as explained here:
  2. Use one of the "fetch fasta" DMs.
    Which one is best? It depends if your target reference genome is included in the builds.txt file. How to tell? Reviewing the genome drop-down menu on the upload tool through the UI is one easy way. For example, included already are many human genome builds, including hg19 and you know where to find it (UCSC), but the reconstructed Pteranodon genome you want to work with is not (and you know where to find it publically).
  3. After that has finished, simply proceed to the other DMs. Load a DM tool, then execute it, for each genome, as described in the original shared link. The admin account will have a history created specifically for Data Manager tool execution, to make it easy to track success/failure and which indexes have been created already.
  4. Does the order matter? A little bit.
    1. Get the fasta sequence (and get its short label, termed a dbkey, into the list of reference genomes if not present)
    2. Generate SAMTools indexes. These are used by many tools.
    3. Generate Picard indexes. Same.
    4. From here, use any of the others. BWA, Bowtie2, and such. 
    5. Generate a 2bit version of the genome (optional). Used by "Extract Genomic DNA" and a few others. The DM for this index is final development and will be published to the Tool Shed likely this week. (Knowing this is not really an index but rather a compressed version of a fasta sequence and that certain tools require this genome input format).

Good luck and please let us know if any issues pop up! Jen, Galaxy team

ADD COMMENTlink modified 21 months ago • written 3.5 years ago by Jennifer Hillman Jackson25k

The video tutorial was instrumental for me to understand the process. Particularly the whole hidden admin history, finding that out with the video was a game changer.

However that said I had to reboot my galaxy instance each time a data_manager tool was installed, and then again when an index had been made before it would appear. All good though as it's now working :-) 

ADD REPLYlink written 3.5 years ago by Nicholas.Blackburn140

Thanks for the feedback about the install/reboot issue. We'll test more here and see if there is something going on. To help us try to reproduce, if you have time to send a bit more about your configuration that would help. 

  1. local ( vs cloud (
  2. new clone from bitbucket or github?
  3. updated instance from bitbutcket or github?
  4. installed or updated to release_15.05 or a different branch (dev?)
  5. if local, anything else done beyond the basic set-up in no.1 above (additional production instance config?)
  6. if cloud, anything extra?

Thanks! Jen, Galaxy team

ADD REPLYlink written 3.5 years ago by Jennifer Hillman Jackson25k

Hi Jen,
Our sysadmin set up a fair bit and he's finished with us for now so I can guess a bit based on what I know. 

- Local instance
- recent clone from github, latest release
- and we got tools in a roundabout way following details you linked us to here

If there's any system stats I can pull out for you let me know where / how. 


ADD REPLYlink modified 3.5 years ago • written 3.5 years ago by Nicholas.Blackburn140
gravatar for Bjoern Gruening
3.5 years ago by
Bjoern Gruening5.1k
Bjoern Gruening5.1k wrote:


If you are running your own instance you can install various data managers from the Galaxy Tool Shed. More information is available in the Galaxy wiki:



ADD COMMENTlink written 3.5 years ago by Bjoern Gruening5.1k

Note to anyone else who reads this thread: watch the tutorial video at this link - extremely helpful.

ADD REPLYlink written 3.5 years ago by Nicholas.Blackburn140

Hi, thanks for getting back to me. I've read that link and several others and am still thoroughly confused. I really need a step-by-step guide as I explained above. 

ADD REPLYlink written 3.5 years ago by Nicholas.Blackburn140
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 183 users visited in the last hour