Question: Create DBKey and Reference Genome
1
gravatar for jasonschultz
14 months ago by
jasonschultz10
jasonschultz10 wrote:

Hello again,

I am trying to load in hg38 into my local Galaxy instance. I am getting errors at the end, and I don't know why. I know the file downloads from UCSC, but it is the populating of the dbkey that I believe is the issue. However, I don't know what I am doing wrong.

I chose the option New for "Use existing dbkey or create a new one". Then it asks for dbkey, Display name for dbkey, Name of sequene, ID for Sequence.
How do I fill these in?

I picked UCSC for "Choose the source for the reference genome" and used "hg38" for the "UCSC's DBKEY for source FASTA".

This is the error I get:

    Traceback (most recent call last):
  File "/Users/jasonschultz/shed_tools/toolshed.g2.bx.psu.edu/repos/devteam/data_manager_fetch_genome_dbkeys_all_fasta/86fa71e9b427/data_manager_fetch_genome_dbkeys_all_fasta/data_manager/data_manager_fetch_genome_all_fasta_dbkeys.py", line 454, in <module>
    main()
  File "/Users/jasonschultz/shed_tools/toolshed.g2.bx.psu.edu/repos/devteam/data_manager_fetch_genome_dbkeys_all_fasta/86fa71e9b427/data_manager_fetch_genome_dbkeys_all_fasta/data_manager/data_manager_fetch_genome_all_fasta_dbkeys.py", line 447, in main
    REFERENCE_SOURCE_TO_DOWNLOAD[ params['param_dict']['reference_source']['reference_source_selector'] ]( data_manager_dict, params, target_directory, dbkey, dbkey_name, sequence_id, sequence_name, tmp_dir )
  File "/Users/jasonschultz/shed_tools/toolshed.g2.bx.psu.edu/repos/devteam/data_manager_fetch_genome_dbkeys_all_fasta/86fa71e9b427/data_manager_fetch_genome_dbkeys_all_fasta/data_manager/data_manager_fetch_genome_all_fasta_dbkeys.py", line 278, in download_from_ucsc
    add_fasta_to_table(data_manager_dict, fasta_readers, target_directory, dbkey, dbkey_name, sequence_id, sequence_name, params)
  File "/Users/jasonschultz/shed_tools/toolshed.g2.bx.psu.edu/repos/devteam/data_manager_fetch_genome_dbkeys_all_fasta/86fa71e9b427/data_manager_fetch_genome_dbkeys_all_fasta/data_manager/data_manager_fetch_genome_all_fasta_dbkeys.py", line 270, in add_fasta_to_table
    for data_table_name, data_table_entry in _stream_fasta_to_file( fasta_readers, target_directory, dbkey, dbkey_name, sequence_id, sequence_name, params ):
  File "/Users/jasonschultz/shed_tools/toolshed.g2.bx.psu.edu/repos/devteam/data_manager_fetch_genome_dbkeys_all_fasta/86fa71e9b427/data_manager_fetch_genome_dbkeys_all_fasta/data_manager/data_manager_fetch_genome_all_fasta_dbkeys.py", line 366, in _stream_fasta_to_file
    compute_fasta_length( fasta_filename, os.path.join( target_directory, len_base_name ), keep_first_word=True )
  File "/Users/jasonschultz/shed_tools/toolshed.g2.bx.psu.edu/repos/devteam/data_manager_fetch_genome_dbkeys_all_fasta/86fa71e9b427/data_manager_fetch_genome_dbkeys_all_fasta/data_manager/data_manager_fetch_genome_all_fasta_dbkeys.py", line 400, in compute_fasta_length
    fasta_title = fasta_title.split()[0]
IndexError: list index out of range

Any help is greatly appreciated. I do have the whole genome on my computer - I used RSync to get it. But I don't know how to get that into Galaxy. If that is easier, that would help too.

Jason

galaxy • 566 views
ADD COMMENTlink modified 14 months ago • written 14 months ago by jasonschultz10
0
gravatar for Jennifer Hillman Jackson
14 months ago by
United States
Jennifer Hillman Jackson23k wrote:

Hello,

1) Are you using the most current Galaxy release? If not, upgrade. http://galaxy.readthedocs.io/en/master/releases/15.10_announce.html

2) The hg38 reference genome should be in the list of existing builds/dbkeys. You don't need to add it as a new one and in fact doing that can cause conflicts. If it difficult to roll-back DM changes (requires a fresh install in most cases), but it sounds like this didn't go through (which is good news).

Implementing 2) solution

Try using the other Data Manger that makes use of existing dbkeys: data_manager_fetch_genome_all_fasta. Also, do not run more than three UCSC genome fetching jobs concurrently. Additions will be blocked/fail and require a re-run.

Form options:

  • "DBKEY to assign to data": Type in hg38 and select the known build from the search list results

  • "Choose the source for the reference genome": UCSC

  • "UCSC's DBKEY for source FASTA": hg38

  • Leave the rest at default

Hopefully this helps, Jen, Galaxy team

ADD COMMENTlink written 14 months ago by Jennifer Hillman Jackson23k
0
gravatar for jasonschultz
14 months ago by
jasonschultz10
jasonschultz10 wrote:

Is the most recent release 16.07 or 15.10? I am running 15.10 now. I don't see hg38 in the build/dbkey list.

Where is this list? I am using the data manager tool you mentioned, and hg38 is not populating. Is it possible that an older version of galaxy that I had and tried to remove is interfering? How do I do a clean uninstall of galaxy? It looks like shed_tools is automatically in the root, as is galaxy. Are there other files that need to be removed if unstalling and reinstalling galaxy?

Thanks!

ADD COMMENTlink written 14 months ago by jasonschultz10

Update: My mistake, hg38 is not included in the list of builds by default. Add the genome using this Data manager tool:

data_manager_fetch_genome_dbkeys_all_fasta Allows optionally defining a new DBKEY and retrieves a FASTA file and populate the all_fasta.loc data table.

ADD REPLYlink written 12 months ago by Jennifer Hillman Jackson23k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 76 users visited in the last hour