Question: Indexing Files Everytime - Performance Issue
5.9 years ago by
Praveen Raj Somarajan100 wrote:
All, It is noticed that Galaxy/GATK indexes reference fasta & dbSNP file everytime when it runs. Re-indexing takes time (~10min), hence it affects overall run time when it use for multiple times. However, this could be avoided by reusing the available index. Here is the snapshot of the log: INFO 11:43:57,365 HelpFormatter - The Genome Analysis Toolkit (GATK) v1.4-21-g30b937d, Compiled 2012/02/01 19:01:14 INFO 11:43:57,365 HelpFormatter - Copyright (c) 2010 The Broad Institute INFO 11:43:57,365 HelpFormatter - Please view our documentation at INFO 11:43:57,366 HelpFormatter - For support, please view our support site at INFO 11:43:57,367 HelpFormatter - ----------------------------------- ---------------------------------------------- INFO 11:43:57,429 GenomeAnalysisEngine - Strictness is STRICT INFO 11:43:57,432 ReferenceDataSource - Index file /tmp/tmp-gatk- 6jlUfH/gatk_input.fasta.fai does not exist. Trying to create it now. PROGRESS UPDATE: file is 15 percent complete PROGRESS UPDATE: file is 28 percent complete PROGRESS UPDATE: file is 91 percent complete INFO 11:45:32,231 ReferenceDataSource - Dict file /tmp/tmp-gatk- 6jlUfH/gatk_input.dict does not exist. Trying to create it now. INFO 11:45:54,262 SAMDataSource$SAMReaders - Initializing SAMRecords in serial INFO 11:45:54,280 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.02 INFO 11:45:54,304 RMDTrackBuilder - Creating Tribble index in memory for file /tmp/tmp-gatk-6jlUfH/input_dbsnp_0.vcf INFO 11:48:05,910 RMDTrackBuilder - Writing Tribble index to disk for file /tmp/tmp-gatk-6jlUfH/input_dbsnp_0.vcf.idx Do we have any option/alternate in Galaxy to avoid this re-indexing at /tmp, as I have already built the index for reference and dbSNP. Look forward to any suggestions. Thanks, Raj
ADD COMMENTlink modified 5.9 years ago by Jennifer Hillman Jackson25k • written 5.9 years ago by Praveen Raj Somarajan100
5.9 years ago by
United States
Jennifer Hillman Jackson25k wrote:
Hi Raj, The GATK tool wrappers are still in Beta and are currently under redesign. Since this question is about a local install, it is probably better for the mailing list. When you write in (new thread, please), could you please clarify a bit more? Do you need help with installing native indexes for GATK? Or do you want to re-use indexes generated after a custom genome/other inputs are used (not a current feature, but maybe you want feedback from other developers)? Thanks! Jen Galaxy team -- Jennifer Jackson
ADD COMMENTlink written 5.9 years ago by Jennifer Hillman Jackson25k
