Question: Genbank Database Files
This should be easy (but not for me so far). I want to do local blast searches, so I download the premade nr protein blast database from GenBank. It is split into 10 .tar.gz files. I've decompressed them all, and now I want to put all the file parts together. Can I simply concatenate all similar files? (e.g. all 10 parts of the .phd files). The Readme mentions use of an alias file, but I did not find this at all clear. A set of step-by-step decompression and restoration instructions would be useful. I could not find any. Thanks for any assistance, Mike DS Sent from my iPhone4
Peter Cock
On Sun, Apr 28, 2013 at 11:22 PM, Mike Dyall-Smith Don't cat anything - just download all nr.*.tar.gz files, and decompress them. You'll have a load of files including a special alias file called nr.pal which is how BLAST knows how to deal with the combined 'nr' database. Peter
Peter Cock
On Mon, Apr 29, 2013 at 3:27 AM, Mike Dyall-Smith Hi Mike, Unless you're using a Graphical decompression tool which is trying to be too helpful, each tar-ball does *not* decompress into its own folder. The files should all be in the *same* folder. I use this to verify the checksums, $ md5sum --check nr.00.tar.gz.md5 nr.00.tar.gz: OK Then I use this to decompress the tar-balls, $ tar -zxvf nr.00.tar.gz etc (Actually I don't do this personally any more - it has been setup to happen automatically when the NCBI update the databases.) We keep all our NCBI databases in the same folder, $ ls /data/blastdb/ncbi/nr.* /data/blastdb/ncbi/ /data/blastdb/ncbi/nr.00.phi /data/blastdb/ncbi/nr.00.phr /data/blastdb/ncbi/ /data/blastdb/ncbi/nr.00.pnd /data/blastdb/ncbi/nr.00.pni /data/blastdb/ncbi/nr.00.pog /data/blastdb/ncbi/nr.00.ppd /data/blastdb/ncbi/nr.00.ppi /data/blastdb/ncbi/nr.00.psd /data/blastdb/ncbi/nr.00.psi /data/blastdb/ncbi/nr.00.psq /data/blastdb/ncbi/nr.00.tar.gz /data/blastdb/ncbi/nr.00.tar.gz.md5 ... /data/blastdb/ncbi/ /data/blastdb/ncbi/nr.10.phi /data/blastdb/ncbi/nr.10.phr /data/blastdb/ncbi/ /data/blastdb/ncbi/nr.10.pnd /data/blastdb/ncbi/nr.10.pni /data/blastdb/ncbi/nr.10.pog /data/blastdb/ncbi/nr.10.ppd /data/blastdb/ncbi/nr.10.ppi /data/blastdb/ncbi/nr.10.psd /data/blastdb/ncbi/nr.10.psi /data/blastdb/ncbi/nr.10.psq /data/blastdb/ncbi/nr.10.tar.gz /data/blastdb/ncbi/nr.10.tar.gz.md5 /data/blastdb/ncbi/nr.pal We can then refer to the NR database at the command line as /data/blastdb/ncbi/nr or as just nr if the BLAST database path is configured to check this folder. In this folder we also have other NCBI database, like NT: $ ls /data/blastdb/ncbi/nt.* /data/blastdb/ncbi/nt.00.nhd /data/blastdb/ncbi/nt.00.nhi /data/blastdb/ncbi/nt.00.nhr /data/blastdb/ncbi/nt.00.nin /data/blastdb/ncbi/nt.00.nnd /data/blastdb/ncbi/nt.00.nni /data/blastdb/ncbi/nt.00.nog /data/blastdb/ncbi/nt.00.nsd /data/blastdb/ncbi/nt.00.nsi /data/blastdb/ncbi/nt.00.nsq /data/blastdb/ncbi/nt.00.tar.gz /data/blastdb/ncbi/nt.00.tar.gz.md5 ... /data/blastdb/ncbi/nt.13.nhd /data/blastdb/ncbi/nt.13.nhi /data/blastdb/ncbi/nt.13.nhr /data/blastdb/ncbi/nt.13.nin /data/blastdb/ncbi/nt.13.nnd /data/blastdb/ncbi/nt.13.nni /data/blastdb/ncbi/nt.13.nog /data/blastdb/ncbi/nt.13.nsd /data/blastdb/ncbi/nt.13.nsi /data/blastdb/ncbi/nt.13.nsq /data/blastdb/ncbi/nt.13.tar.gz /data/blastdb/ncbi/nt.13.tar.gz.md5 /data/blastdb/ncbi/nt.nal Note you don't need to keep the *.tar.gz and the *.md5 files once you've verified the checksum (using md5sum to detect any data corruption during download) and decompressed the tar-ball. Peter P.S. This galaxy-users list is meant for discussion of using the tools within Galaxy from an end user perspective. Although there is talk about creating a new Galaxy mailing list specifically for deployment questions like this, currently galaxy-devel is preferred for this kind of discussion.
