Question: Genbank Database Files
0
gravatar for Mike Dyall-Smith
5.6 years ago by
Mike Dyall-Smith20 wrote:
This should be easy (but not for me so far). I want to do local blast searches, so I download the premade nr protein blast database from GenBank. It is split into 10 .tar.gz files. I've decompressed them all, and now I want to put all the file parts together. Can I simply concatenate all similar files? (e.g. all 10 parts of the .phd files). The Readme mentions use of an alias file, but I did not find this at all clear. A set of step-by-step decompression and restoration instructions would be useful. I could not find any. Thanks for any assistance, Mike DS Sent from my iPhone4
• 1.9k views
ADD COMMENTlink modified 5.6 years ago by Peter Cock1.4k • written 5.6 years ago by Mike Dyall-Smith20
0
gravatar for Peter Cock
5.6 years ago by
Peter Cock1.4k
European Union
Peter Cock1.4k wrote:
On Sun, Apr 28, 2013 at 11:22 PM, Mike Dyall-Smith Don't cat anything - just download all nr.*.tar.gz files, and decompress them. You'll have a load of files including a special alias file called nr.pal which is how BLAST knows how to deal with the combined 'nr' database. Peter
ADD COMMENTlink written 5.6 years ago by Peter Cock1.4k
0
gravatar for Peter Cock
5.6 years ago by
Peter Cock1.4k
European Union
Peter Cock1.4k wrote:
On Mon, Apr 29, 2013 at 3:27 AM, Mike Dyall-Smith Hi Mike, Unless you're using a Graphical decompression tool which is trying to be too helpful, each tar-ball does *not* decompress into its own folder. The files should all be in the *same* folder. I use this to verify the checksums, $ md5sum --check nr.00.tar.gz.md5 nr.00.tar.gz: OK Then I use this to decompress the tar-balls, $ tar -zxvf nr.00.tar.gz etc (Actually I don't do this personally any more - it has been setup to happen automatically when the NCBI update the databases.) We keep all our NCBI databases in the same folder, $ ls /data/blastdb/ncbi/nr.* /data/blastdb/ncbi/nr.00.phd /data/blastdb/ncbi/nr.00.phi /data/blastdb/ncbi/nr.00.phr /data/blastdb/ncbi/nr.00.pin /data/blastdb/ncbi/nr.00.pnd /data/blastdb/ncbi/nr.00.pni /data/blastdb/ncbi/nr.00.pog /data/blastdb/ncbi/nr.00.ppd /data/blastdb/ncbi/nr.00.ppi /data/blastdb/ncbi/nr.00.psd /data/blastdb/ncbi/nr.00.psi /data/blastdb/ncbi/nr.00.psq /data/blastdb/ncbi/nr.00.tar.gz /data/blastdb/ncbi/nr.00.tar.gz.md5 ... /data/blastdb/ncbi/nr.10.phd /data/blastdb/ncbi/nr.10.phi /data/blastdb/ncbi/nr.10.phr /data/blastdb/ncbi/nr.10.pin /data/blastdb/ncbi/nr.10.pnd /data/blastdb/ncbi/nr.10.pni /data/blastdb/ncbi/nr.10.pog /data/blastdb/ncbi/nr.10.ppd /data/blastdb/ncbi/nr.10.ppi /data/blastdb/ncbi/nr.10.psd /data/blastdb/ncbi/nr.10.psi /data/blastdb/ncbi/nr.10.psq /data/blastdb/ncbi/nr.10.tar.gz /data/blastdb/ncbi/nr.10.tar.gz.md5 /data/blastdb/ncbi/nr.pal We can then refer to the NR database at the command line as /data/blastdb/ncbi/nr or as just nr if the BLAST database path is configured to check this folder. In this folder we also have other NCBI database, like NT: $ ls /data/blastdb/ncbi/nt.* /data/blastdb/ncbi/nt.00.nhd /data/blastdb/ncbi/nt.00.nhi /data/blastdb/ncbi/nt.00.nhr /data/blastdb/ncbi/nt.00.nin /data/blastdb/ncbi/nt.00.nnd /data/blastdb/ncbi/nt.00.nni /data/blastdb/ncbi/nt.00.nog /data/blastdb/ncbi/nt.00.nsd /data/blastdb/ncbi/nt.00.nsi /data/blastdb/ncbi/nt.00.nsq /data/blastdb/ncbi/nt.00.tar.gz /data/blastdb/ncbi/nt.00.tar.gz.md5 ... /data/blastdb/ncbi/nt.13.nhd /data/blastdb/ncbi/nt.13.nhi /data/blastdb/ncbi/nt.13.nhr /data/blastdb/ncbi/nt.13.nin /data/blastdb/ncbi/nt.13.nnd /data/blastdb/ncbi/nt.13.nni /data/blastdb/ncbi/nt.13.nog /data/blastdb/ncbi/nt.13.nsd /data/blastdb/ncbi/nt.13.nsi /data/blastdb/ncbi/nt.13.nsq /data/blastdb/ncbi/nt.13.tar.gz /data/blastdb/ncbi/nt.13.tar.gz.md5 /data/blastdb/ncbi/nt.nal Note you don't need to keep the *.tar.gz and the *.md5 files once you've verified the checksum (using md5sum to detect any data corruption during download) and decompressed the tar-ball. Peter P.S. This galaxy-users list is meant for discussion of using the tools within Galaxy from an end user perspective. Although there is talk about creating a new Galaxy mailing list specifically for deployment questions like this, currently galaxy-devel is preferred for this kind of discussion.
ADD COMMENTlink written 5.6 years ago by Peter Cock1.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 169 users visited in the last hour