Hi all,
I am trying to follow the recommended workflow for GATK SNP analysis on RNA-seq (https://www.broadinstitute.org/gatk/guide/topic?name=methods) on the Galaxy public server usegalaxy.org). All is well until the count covariates step when I received this error:
Picked up _JAVA_OPTIONS: -Djava.io.tmpdir=/tmp [Sat Sep 20 14:02:43 CDT 2014] net.sf.picard.sam.CreateSequenceDictionary REFERENCE=/tmp/tmp-gatk-YxuKgH/gatk_input.fasta OUTPUT=/tmp/tmp-gatk-YxuKgH/dict8496086377301159622.tmp TRUNCATE_NAMES_AT_WHITESPACE=true NUM_SEQUENCES=2147483647 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false [Sat Sep 20 14:02:43 CDT 2014] Executing as g2main@roundup50.tacc.utexas.edu on Linux 2.6.32-358.23.2.el6.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.7.0_40-b43; Picard version: 1.58(1057) [Sat Sep 20 14:02:58 CDT 2014] net.sf.picard.sam.CreateSequenceDictionary done. Elapsed time: 0.25 minutes. Runtime.totalMemory()=2553806848 ##### ERROR ------------------------------------------------------------------------------------------ ##### ERROR stack trace java.lang.OutOfMemoryError: GC overhead limit exceeded at org.broad.tribble.index.linear.LinearIndexCreator.addFeature(LinearIndexCreator.java:58) at org.broad.tribble.index.DynamicIndexCreator.addFeature(DynamicIndexCreator.java:159) at org.broad.tribble.index.IndexFactory.createIndex(IndexFactory.java:136) at org.broad.tribble.index.IndexFactory.createIndex(IndexFactory.java:123) at org.broadinstitute.sting.gatk.refdata.tracks.RMDTrackBuilder.createIndexInMemory(RMDTrackBuilder.java:361) at org.broadinstitute.sting.gatk.refdata.tracks.RMDTrackBuilder.loadIndex(RMDTrackBuilder.java:258) at org.broadinstitute.sting.gatk.refdata.tracks.RMDTrackBuilder.getFeatureSource(RMDTrackBuilder.java:199) at org.broadinstitute.sting.gatk.refdata.tracks.RMDTrackBuilder.createInstanceOfTrack(RMDTrackBuilder.java:128) at org.broadinstitute.sting.gatk.datasources.rmd.ReferenceOrderedQueryDataPool.<init>(ReferenceOrderedDataSource.java:205) at org.broadinstitute.sting.gatk.datasources.rmd.ReferenceOrderedDataSource.<init>(ReferenceOrderedDataSource.java:85) at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.getReferenceOrderedDataSources(GenomeAnalysisEngine.java:783) at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.initializeDataSources(GenomeAnalysisEngine.java:653) at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:212) at org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:122) at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:236) at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:146) at org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:90) ##### ERROR ------------------------------------------------------------------------------------------ ##### ERROR A GATK RUNTIME ERROR has occurred (version exported): ##### ERROR ##### ERROR Please visit the wiki to see if this is a known problem ##### ERROR If not, please post the error, with stack trace, to the GATK forum ##### ERROR Visit our wiki for extensive documentation http://www.broadinstitute.org/gsa/wiki ##### ERROR Visit our forum to view answers to commonly asked questions http://getsatisfaction.com/gsa ##### ERROR ##### ERROR MESSAGE: GC overhead limit exceeded ##### ERROR ------------------------------------------------------------------------------------------
After some research I learnt that my BAM file may be too big (I have SE 50M reads) so I used the downsample tool in Count Covariates to downsample all reads to fraction 0.25, so that 75% if my reads are discarded and covariates were counted on 25%. This seemed to be working and took a long time so I left it running overnight. However, this morning I returned to a similar error, specifically:
Picked up _JAVA_OPTIONS: -Djava.io.tmpdir=/tmp [Sat Sep 20 14:41:46 CDT 2014] net.sf.picard.sam.CreateSequenceDictionary REFERENCE=/tmp/tmp-gatk-jPeTaW/gatk_input.fasta OUTPUT=/tmp/tmp-gatk-jPeTaW/dict7016368597036568962.tmp TRUNCATE_NAMES_AT_WHITESPACE=true NUM_SEQUENCES=2147483647 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false [Sat Sep 20 14:41:46 CDT 2014] Executing as g2main@roundup50.tacc.utexas.edu on Linux 2.6.32-358.23.2.el6.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.7.0_40-b43; Picard version: 1.58(1057) [Sat Sep 20 14:42:00 CDT 2014] net.sf.picard.sam.CreateSequenceDictionary done. Elapsed time: 0.23 minutes. Runtime.totalMemory()=2553806848 ##### ERROR ------------------------------------------------------------------------------------------ ##### ERROR stack trace java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Unknown Source) at java.util.ArrayList.grow(Unknown Source) at java.util.ArrayList.ensureExplicitCapacity(Unknown Source) at java.util.ArrayList.ensureCapacityInternal(Unknown Source) at java.util.ArrayList.add(Unknown Source) at org.broad.tribble.index.linear.LinearIndex$ChrIndex.addBlock(LinearIndex.java:161) at org.broad.tribble.index.linear.LinearIndexCreator.addFeature(LinearIndexCreator.java:46) at org.broad.tribble.index.DynamicIndexCreator.addFeature(DynamicIndexCreator.java:159) at org.broad.tribble.index.IndexFactory.createIndex(IndexFactory.java:136) at org.broad.tribble.index.IndexFactory.createIndex(IndexFactory.java:123) at org.broadinstitute.sting.gatk.refdata.tracks.RMDTrackBuilder.createIndexInMemory(RMDTrackBuilder.java:361) at org.broadinstitute.sting.gatk.refdata.tracks.RMDTrackBuilder.loadIndex(RMDTrackBuilder.java:258) at org.broadinstitute.sting.gatk.refdata.tracks.RMDTrackBuilder.getFeatureSource(RMDTrackBuilder.java:199) at org.broadinstitute.sting.gatk.refdata.tracks.RMDTrackBuilder.createInstanceOfTrack(RMDTrackBuilder.java:128) at org.broadinstitute.sting.gatk.datasources.rmd.ReferenceOrderedQueryDataPool.<init>(ReferenceOrderedDataSource.java:205) at org.broadinstitute.sting.gatk.datasources.rmd.ReferenceOrderedDataSource.<init>(ReferenceOrderedDataSource.java:85) at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.getReferenceOrderedDataSources(GenomeAnalysisEngine.java:783) at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.initializeDataSources(GenomeAnalysisEngine.java:653) at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:212) at org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:122) at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:236) at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:146) at org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:90) ##### ERROR ------------------------------------------------------------------------------------------ ##### ERROR A GATK RUNTIME ERROR has occurred (version exported): ##### ERROR ##### ERROR Please visit the wiki to see if this is a known problem ##### ERROR If not, please post the error, with stack trace, to the GATK forum ##### ERROR Visit our wiki for extensive documentation http://www.broadinstitute.org/gsa/wiki ##### ERROR Visit our forum to view answers to commonly asked questions http://getsatisfaction.com/gsa ##### ERROR ##### ERROR MESSAGE: Java heap space ##### ERROR ------------------------------------------------------------------------------------------
Any idea how to fix this?
This is pilot data so I am testing the analysis on the public Galaxy server. I can't install local galaxy as you don't support windows. I am working on having 16Gb memory installed on my laptop (from 4Gb) and then installing Linux and local galaxy for future analysis but for now this is not an option.
Thanks for your help!