Question: Error downloading Kraken database with data manager (local galaxy)
0
gravatar for vebaev
22 months ago by
vebaev130
vebaev130 wrote:

I'm trying to download via the data manager Kraken database for Bacteria and got the folowing error:

2017-01-20 17:50:04 (1.45 MB/s) - ‘taxdump.tar.gz’ saved [38178181]

--2017-01-20 18:02:19--  ftp://ftp.ncbi.nih.gov/genomes/Bacteria/all.fna.tar.gz
           => ‘all.fna.tar.gz’
Resolving ftp.ncbi.nih.gov ftp.ncbi.nih.gov)... 130.14.250.7, 2607:f220:41e:250::13
Connecting to ftp.ncbi.nih.gov ftp.ncbi.nih.gov)|130.14.250.7|:21... connected.
Logging in as anonymous ... Logged in!
==> SYST ... done.    ==> PWD ... done.
==> TYPE I ... done.  ==> CWD (1) /genomes/Bacteria ... 
No such directory ‘genomes/Bacteria’.
kraken • 1.0k views
ADD COMMENTlink modified 22 months ago by Jennifer Hillman Jackson25k • written 22 months ago by vebaev130

Related to https://biostar.usegalaxy.org/p/21290/

ADD REPLYlink modified 22 months ago • written 22 months ago by Jennifer Hillman Jackson25k

Thanks, can you update the thread when the data manager is updated with the correct paths from NCBI and is working so I can run it on my Galaxy?

ADD REPLYlink written 22 months ago by vebaev130
1

Yes, the testing results, correct protocol, and any open tickets for changes (as needed) will be part of the final reply. Jen

ADD REPLYlink modified 22 months ago • written 22 months ago by Jennifer Hillman Jackson25k

There was a newer version of data manager, and after update I got error related to memory:

2017-01-23 14:18:11 (2.14 MB/s) - ‘all.fna.tar.gz’ saved [2934042455]

terminate called after throwing an instance of 'jellyfish::invertible_hash::ErrorAllocation'
  what():  Failed to allocate 91625968992 bytes of memory
xargs: cat: terminated by signal 13
/export/tool_deps/kraken/0.10.6-eaf8fb68/iuc/package_kraken_0_10_6_eaf8fb68/0743afe4dcb8/bin/build_kraken_db.sh: line 96: 24633 Broken pipe             find library/ '(' -name '*.fna' -o -name '*.fa' -o -name '*.ffn' ')' -print0
     24634 Exit 125                | xargs -0 cat
     24635 Aborted                 (core dumped) | jellyfish count -m $KRAKEN_KMER_LEN -s $KRAKEN_HASH_SIZE -C -t $KRAKEN_THREAD_CT -o database /dev/fd/0

I think it wants allocate 90GB of memmory which my VM do not have. Is there a way download a pre-build database? For example this one from https://ccb.jhu.edu/software/kraken/: MiniKraken DB (2.7 GB): A pre-built 4 GB database constructed from complete bacterial, archaeal, and viral genomes in RefSeq (as of Dec. 8, 2014). This can be used by users without the computational resources needed to build a Kraken database.

ADD REPLYlink written 22 months ago by vebaev130
0
gravatar for Jennifer Hillman Jackson
22 months ago by
United States
Jennifer Hillman Jackson25k wrote:

Hello,

Using the updated Data Manager was the first pass solution. The memory error is due to the size of the data.

Pre-computed index use is possible in Galaxy, but the configuration must be the same as if installed with a Data Manager. This involves file transfer, installation, and manipulation at the line-command. It is the about the same as setting up other datasets/genome manually. Those processes with documentation are described here (somewhat outdated since not often used - but has the general concepts in place correctly): https://wiki.galaxyproject.org/Admin/DataPreparation

Please note that the Kraken databases used at http://usegalaxy.org are not on the rsync data server mentioned in the wiki (but may be added in the future).

In short, do the same steps as the Data Manager would do after building the index - place the index into the appropriate directory, modify the .loc file to point to it, and restart the instance.

Your other option is to move a cloud Galaxy with more dedicated memory resources (Amazon offers educational grants that are intended to cover costs for those doing research or training).

Hope this works out! Jen, Galaxy team

ADD COMMENTlink written 22 months ago by Jennifer Hillman Jackson25k

Thanks, seems quite complicated.....What about making custom Kraken databases from the well known SILVA and Greengenes which probably will be less size demanding?

ADD REPLYlink written 22 months ago by vebaev130

This could be asked as an enhancement request for the tool authors to consider. The link to the Github tool repository is on the tool's form within the Tool Shed. Follow the link and enter the request as an "Issue", making sure to note the exact tool version (including the repo's link to the tool shed is one way).

ADD REPLYlink written 22 months ago by Jennifer Hillman Jackson25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 165 users visited in the last hour