Question: Loading human hg38 onto local instance
1
gravatar for Mark Lindsay
4.1 years ago by
Mark Lindsay70
United Kingdom
Mark Lindsay70 wrote:

Dear Galaxy Team

I have a Galaxy Instance loaded onto my computer and have been trying to create a DBkey and install the latest human hg38 genome using the 'Create DBkey and Reference Genome' data manager. However, I keep receiving the following error message (this worked fine when downloading the mouse mm10 genome).

Exception( 'Unable to determine filename for UCSC Genome for %s: %s' % ( ucsc_dbkey, path_contents ) )
Exception: Unable to determine filename for UCSC Genome for hg38: ['.', '..', 'est.fa.gz', 'est.fa.gz.md5', 'mrna.fa.gz', 'mrna.fa.gz.md5', 'refMrna.fa.gz', 'refMrna.fa.gz.md5', 'xenoMrna.fa.gz', 'xenoMrna.fa.gz.md5', 'xenoRefMrna.fa.gz', 'xenoRefMrna.fa.gz.md5', 'analysisSet', 'README.txt', 'hg38.2bit', 'hg38.agp.gz', 'hg38.chromFa.tar.gz', 'hg38.chromFaMasked.tar.gz', 'hg38.fa.align.gz', 'hg38.fa.gz', 'hg38.fa.masked.gz', 'hg38.fa.out.gz', 'hg38.trf.bed.gz', 'md5sum.txt']

I can download hg38 using the reference genome data manager although I cannot assign this to hg38 since this is not available in the list of dbkeys in the pull-down menu. 

I have loaded hg38.fa directly into a Galaxy history and then attempted to run the 'Create DBkey and Reference Genome' and get the following error:

IndexError: list index out of range

Do you have any suggestions on a way forward? 

Many thanks for you help.

Mark

ADD COMMENTlink modified 4.1 years ago by Jennifer Hillman Jackson25k • written 4.1 years ago by Mark Lindsay70
1
gravatar for Jennifer Hillman Jackson
4.1 years ago by
United States
Jennifer Hillman Jackson25k wrote:

Hello,

The issue is due to the lack of this file being present:

chromFa.tar.gz

For hg38, it is named instead like:

hg38.chromFa.tar.gz

Example of hg19 content and hg38:
http://hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips/
http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/

Two solutions:

1) the data manager is altered to accept either types of file (by you, or our team could possibly do this, but it likely will not be immediate)

2) we ask UCSC to include a symbolic link that will point the standard name to the new name. 

chromFa.tar.gz -> hg38.chromFa.tar.gz

Both would be best. Especially if the new naming method is going to become the standard ongoing. I can understand why they changed - labels matter and this makes the data file contents clearer, when outside of the UCSC downloads file directory structure.

I will send UCSC an email and point this question to our dev group. I am not going to make a Trello ticket quite yet, would like feedback from both of these sources first. If I do create a ticket (a solution is not available in the next week or so), will post it back here.

To get you going immediately, the data manager changes is the best bet. Are you able to do this or you need programming help for the change? 

Thanks, Jen, Galaxy team

ADD COMMENTlink written 4.1 years ago by Jennifer Hillman Jackson25k

Update: Our team has discussed and an updated to this Data Manager is now planned. Please see this Trello ticket for the details: http://trello.com/c/kPkwDHmi. Thank you so much for reporting the issue. Jen, Galaxy team

ADD REPLYlink written 4.1 years ago by Jennifer Hillman Jackson25k

Just a quick update that this is fixed now, and the dbkey creating data manager is also available in the main toolshed: https://toolshed.g2.bx.psu.edu/view/devteam/data_manager_fetch_genome_dbkeys_all_fasta

ADD REPLYlink written 3.8 years ago by Daniel Blankenberg ♦♦ 1.7k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 173 users visited in the last hour