Question: Cannot add hg19 reference genome to bowtie2 on galaxy
1
gravatar for leontp587
3.8 years ago by
leontp58750
United States
leontp58750 wrote:

I tried to bring up a quick instance of galaxy up on my own linux server to align some fastq reads to the hg19 genome. For some reason, I cannot get the hg19 reference genome to show up in the bowtie2 reference genome dropdown in galaxy. Below are the steps I took, but after restarting the server the reference genome still did not show up. What am I doing wrong?

What I did:

1) unziped ftp://ftp.cbcb.umd.edu/pub/data/bowtie2_indexes/incl/hg19.zip to /home/leon/ref_data/bowtie2/hg19

[leon@gal ~]$ ls -l /home/leon/ref_data/bowtie2/hg19
total 3975260
-rw-r--r--. 1 leon leon 960018873 May  2  2012 hg19.1.bt2
-rw-r--r--. 1 leon leon 716863572 May  2  2012 hg19.2.bt2
-rw-r--r--. 1 leon leon      3833 May  2  2012 hg19.3.bt2
-rw-r--r--. 1 leon leon 716863565 May  2  2012 hg19.4.bt2
-rw-r--r--. 1 leon leon 960018873 May  3  2012 hg19.rev.1.bt2
-rw-r--r--. 1 leon leon 716863572 May  3  2012 hg19.rev.2.bt2
-rwxr-xr-x. 1 leon leon      3189 May  2  2012 make_hg19.sh

2) I added this genome to the bowtie2_indices.loc file:

# In ~/galaxy-dist/tool-data/bowtie2_indices.loc:
hg19    hg19    Human (hg19)    /home/leon/ref_data/bowtie2/hg19/hg19

 

ADD COMMENTlink modified 3.8 years ago by Jennifer Hillman Jackson25k • written 3.8 years ago by leontp58750
3
gravatar for leontp587
3.8 years ago by
leontp58750
United States
leontp58750 wrote:

Finally got it to work!

Apparently the solution is to modify the tool_data_table_conf.xml file and add the lines:

<table name="bowtie2_indexes" comment_char="#">
            <columns>value, dbkey, name, path</columns>
            <file path="tool-data/bowtie2_indices.loc" />
    </table>

 

Then, open universe_wsgi.in and uncomment the line that points to tool_data_table_conf.xml

 

 

ADD COMMENTlink written 3.8 years ago by leontp58750

Hi Leon, Yes - that's needed now, so sorry. We need to document the config portion for data tables in this process better, is a newer change, but no more excuses - instead action! I created a Trello ticket for this on the public board: http://trello.com/c/FinBeDet

Thanks for follow up, and posting the solution here. We'll include full instructions for all modifications in the updated wiki. Jen

ADD REPLYlink written 3.8 years ago by Jennifer Hillman Jackson25k

Hi there,

I realize this post is really old but I am desperately trying to manually add genomes into my local Galaxy instance. Specifically, I have done exactly as you did above. I have modified my ~/galaxy-dist/tool-data/bowtie2_indices.loc and the reference genome is added to the builds.txt file. I did not modify the tool_data_table_con.xml file, why is this part not in the wiki? Where is this file located? Thanks so much for the help. I am so frustrated.

ADD REPLYlink written 14 months ago by gkuffel22170
0
gravatar for Jennifer Hillman Jackson
3.8 years ago by
United States
Jennifer Hillman Jackson25k wrote:

Hello,

Did this reference genome get added to the builds.txt file?

The genome also needs to be included in the alignseq.loc and ideally the all_fasta.loc files. A symbolic link in the index directory pointing the reference genome .fa file is also standard.

Other items to check are that there are tabs in your .loc file separating the columns and that there are no extra spaces or lines present.

Full instructions for adding genomes & indexes are here:
http://wiki.galaxyproject.org/Admin/DataIntegration

Hopefully this helps to sort out the issue, Jen, Galaxy team

ADD COMMENTlink written 3.8 years ago by Jennifer Hillman Jackson25k
0
gravatar for leontp587
3.8 years ago by
leontp58750
United States
leontp58750 wrote:

Hi Jennifer,

Thanks for your reply, but I'm even more confused.

1) For the builds.txt file, I checked /home/leon/galaxy-dist/tool-data/shared/ucsc/builds.txt, which already has a line for hg19:
hg19    Human Feb. 2009 (GRCh37/hg19) (hg19)

Given that it already has a line hg19 from the default installation, what should I do?

2) alignseq.loc has a message inside that says something about needing axt files. How do I make axt files for the hg19 genome? is that something I download or need to build using some tool?

3) I put the hg19.fa file directly into the folder with the indexes. Do I still need a symbolic link? What should I name the symbolic link?

4) I checked carefully there are indeed tabs in the .loc files and no extra spaces or lines present.

5) Sorry for the dumb and detailed questions. The instructions link you mentioned for DataIntegration didn't help much. For example those instructions didn't mention the symbolic link to the .fa file or all_fasta.loc. Also it did not talk about alignseq.loc or axt files needed. Also I'm not clear on whether the instructions are for adding general genomes or reference genomes that are needed for specific alignment algorithms like bowtie2.

Thanks for your help! It seems like there is so much work needed to get galaxy to do something very simple like align reads to a standard human genome!

ADD COMMENTlink written 3.8 years ago by leontp58750
0
gravatar for Jennifer Hillman Jackson
3.8 years ago by
United States
Jennifer Hillman Jackson25k wrote:

Hi Leon,

Your builds file looks fine, so that is Ok. So is putting the reference genome directly into the same directory as the indexes. The page I sent you has several other wiki pages with more details that list out exact instructions for getting set up. In particular, these two should help:

See the sections for general set-up and Bowtie2:
http://wiki.galaxyproject.org/Admin/DataPreparation

You can download copies of our .loc files and compare to see how to format/organize, or alternatively, use these indexes/loc files are starting places when needed ("axt" is an older format, ".fa" and "2bit" are recommended now). The "location" directory contains the .loc files and each genome has a directory named by the dbkey (for example, "hg19"). 
http://wiki.galaxyproject.org/Admin/UseGalaxyRsync

And another alternative altogether is to use Data Managers, also linked and explained here:
http://wiki.galaxyproject.org/Admin/Tools/DataManagers
With more here, see the "Tutorial" link:
http://wiki.galaxyproject.org/Events/GCC2014/TrainingDay#Tool_Development_from_bright_idea_to_toolshed_-_Data_Managers

After you have gone through this one time, and have the basics set up, adding more genomes will become simpler. Jen, Galaxy team

ADD COMMENTlink written 3.8 years ago by Jennifer Hillman Jackson25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 47 users visited in the last hour