Question: Indexing Files Everytime - Performance Issue
gravatar for Praveen Raj Somarajan
6.4 years ago by
Praveen Raj Somarajan100 wrote:
All, It is noticed that Galaxy/GATK indexes reference fasta & dbSNP file everytime when it runs. Re-indexing takes time (~10min), hence it affects overall run time when it use for multiple times. However, this could be avoided by reusing the available index. Here is the snapshot of the log: INFO 11:43:57,365 HelpFormatter - The Genome Analysis Toolkit (GATK) v1.4-21-g30b937d, Compiled 2012/02/01 19:01:14 INFO 11:43:57,365 HelpFormatter - Copyright (c) 2010 The Broad Institute INFO 11:43:57,365 HelpFormatter - Please view our documentation at INFO 11:43:57,366 HelpFormatter - For support, please view our support site at INFO 11:43:57,367 HelpFormatter - ----------------------------------- ---------------------------------------------- INFO 11:43:57,429 GenomeAnalysisEngine - Strictness is STRICT INFO 11:43:57,432 ReferenceDataSource - Index file /tmp/tmp-gatk- 6jlUfH/gatk_input.fasta.fai does not exist. Trying to create it now. PROGRESS UPDATE: file is 15 percent complete PROGRESS UPDATE: file is 28 percent complete PROGRESS UPDATE: file is 91 percent complete INFO 11:45:32,231 ReferenceDataSource - Dict file /tmp/tmp-gatk- 6jlUfH/gatk_input.dict does not exist. Trying to create it now. INFO 11:45:54,262 SAMDataSource$SAMReaders - Initializing SAMRecords in serial INFO 11:45:54,280 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.02 INFO 11:45:54,304 RMDTrackBuilder - Creating Tribble index in memory for file /tmp/tmp-gatk-6jlUfH/input_dbsnp_0.vcf INFO 11:48:05,910 RMDTrackBuilder - Writing Tribble index to disk for file /tmp/tmp-gatk-6jlUfH/input_dbsnp_0.vcf.idx Do we have any option/alternate in Galaxy to avoid this re-indexing at /tmp, as I have already built the index for reference and dbSNP. Look forward to any suggestions. Thanks, Raj ________________________________ This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely for the use of the addressee(s). If you are not the intended recipient, please notify the sender by e-mail and delete the original message. Further, you are not to copy, disclose, or distribute this e-mail or its contents to any other person and any such actions that are unlawful. This e-mail may contain viruses. Ocimum Biosolutions has taken every reasonable precaution to minimize this risk, but is not liable for any damage you may sustain as a result of any virus in this e-mail. You should carry out your own virus checks before opening the e-mail or attachment. The information contained in this email and any attachments is confidential and may be subject to copyright or other intellectual property protection. If you are not the intended recipient, you are not authorized to use or disclose this information, and we request that you notify us by reply mail or telephone and delete the original message from your mail system. OCIMUMBIO SOLUTIONS (P) LTD
galaxy • 1.0k views
ADD COMMENTlink modified 6.4 years ago by Jennifer Hillman Jackson25k • written 6.4 years ago by Praveen Raj Somarajan100
gravatar for Jennifer Hillman Jackson
6.4 years ago by
United States
Jennifer Hillman Jackson25k wrote:
Hi Raj, The GATK tool wrappers are still in Beta and are currently under redesign. Since this question is about a local install, it is probably better for the mailing list. When you write in (new thread, please), could you please clarify a bit more? Do you need help with installing native indexes for GATK? Or do you want to re-use indexes generated after a custom genome/other inputs are used (not a current feature, but maybe you want feedback from other developers)? Thanks! Jen Galaxy team -- Jennifer Jackson
ADD COMMENTlink written 6.4 years ago by Jennifer Hillman Jackson25k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 179 users visited in the last hour