Error downloading Kraken database with data manager (local galaxy)

Question: Error downloading Kraken database with data manager (local galaxy)

22 months ago by

vebaev • 130

vebaev • 130 wrote:

I'm trying to download via the data manager Kraken database for Bacteria and got the folowing error:

2017-01-20 17:50:04 (1.45 MB/s) - ‘taxdump.tar.gz’ saved [38178181]

--2017-01-20 18:02:19--  ftp://ftp.ncbi.nih.gov/genomes/Bacteria/all.fna.tar.gz
           => ‘all.fna.tar.gz’
Resolving ftp.ncbi.nih.gov ftp.ncbi.nih.gov)... 130.14.250.7, 2607:f220:41e:250::13
Connecting to ftp.ncbi.nih.gov ftp.ncbi.nih.gov)|130.14.250.7|:21... connected.
Logging in as anonymous ... Logged in!
==> SYST ... done.    ==> PWD ... done.
==> TYPE I ... done.  ==> CWD (1) /genomes/Bacteria ... 
No such directory ‘genomes/Bacteria’.

kraken • 1.0k views

ADD COMMENT • link •

modified 22 months ago by Jennifer Hillman Jackson ♦ 25k • written 22 months ago by vebaev • 130

ADD REPLY • link modified 22 months ago • written 22 months ago by Jennifer Hillman Jackson ♦ 25k

Thanks, can you update the thread when the data manager is updated with the correct paths from NCBI and is working so I can run it on my Galaxy?

ADD REPLY • link written 22 months ago by vebaev • 130

Yes, the testing results, correct protocol, and any open tickets for changes (as needed) will be part of the final reply. Jen

ADD REPLY • link modified 22 months ago • written 22 months ago by Jennifer Hillman Jackson ♦ 25k

There was a newer version of data manager, and after update I got error related to memory:

2017-01-23 14:18:11 (2.14 MB/s) - ‘all.fna.tar.gz’ saved [2934042455]

terminate called after throwing an instance of 'jellyfish::invertible_hash::ErrorAllocation'
  what():  Failed to allocate 91625968992 bytes of memory
xargs: cat: terminated by signal 13
/export/tool_deps/kraken/0.10.6-eaf8fb68/iuc/package_kraken_0_10_6_eaf8fb68/0743afe4dcb8/bin/build_kraken_db.sh: line 96: 24633 Broken pipe             find library/ '(' -name '*.fna' -o -name '*.fa' -o -name '*.ffn' ')' -print0
     24634 Exit 125                | xargs -0 cat
     24635 Aborted                 (core dumped) | jellyfish count -m $KRAKEN_KMER_LEN -s $KRAKEN_HASH_SIZE -C -t $KRAKEN_THREAD_CT -o database /dev/fd/0

I think it wants allocate 90GB of memmory which my VM do not have. Is there a way download a pre-build database? For example this one from https://ccb.jhu.edu/software/kraken/: MiniKraken DB (2.7 GB): A pre-built 4 GB database constructed from complete bacterial, archaeal, and viral genomes in RefSeq (as of Dec. 8, 2014). This can be used by users without the computational resources needed to build a Kraken database.

ADD REPLY • link written 22 months ago by vebaev • 130

22 months ago by

Jennifer Hillman Jackson ♦ 25k

United States

Jennifer Hillman Jackson ♦ 25k wrote:

Hello,

Using the updated Data Manager was the first pass solution. The memory error is due to the size of the data.

Pre-computed index use is possible in Galaxy, but the configuration must be the same as if installed with a Data Manager. This involves file transfer, installation, and manipulation at the line-command. It is the about the same as setting up other datasets/genome manually. Those processes with documentation are described here (somewhat outdated since not often used - but has the general concepts in place correctly): https://wiki.galaxyproject.org/Admin/DataPreparation

Please note that the Kraken databases used at http://usegalaxy.org are not on the rsync data server mentioned in the wiki (but may be added in the future).

In short, do the same steps as the Data Manager would do after building the index - place the index into the appropriate directory, modify the .loc file to point to it, and restart the instance.

Your other option is to move a cloud Galaxy with more dedicated memory resources (Amazon offers educational grants that are intended to cover costs for those doing research or training).

Hope this works out! Jen, Galaxy team

ADD COMMENT • link written 22 months ago by Jennifer Hillman Jackson ♦ 25k

Thanks, seems quite complicated.....What about making custom Kraken databases from the well known SILVA and Greengenes which probably will be less size demanding?

ADD REPLY • link written 22 months ago by vebaev • 130

This could be asked as an enhancement request for the tool authors to consider. The link to the Github tool repository is on the tool's form within the Tool Shed. Follow the link and enter the request as an "Issue", making sure to note the exact tool version (including the repo's link to the tool shed is one way).

ADD REPLY • link written 22 months ago by Jennifer Hillman Jackson ♦ 25k

Similar posts • Search »