Question: Error java.lang.OutOfMemoryError: Java heap space on GATK Count Covariates
0
gravatar for sekalazare
3.3 years ago by
sekalazare0
Netherlands
sekalazare0 wrote:

Hi all,

I am trying to follow the recommended workflow for GATK SNP analysis on RNA-seq (https://www.broadinstitute.org/gatk/guide/topic?name=methods) on the Galaxy public server usegalaxy.org). All is well until the count covariates step when I received this error:

Picked up _JAVA_OPTIONS: -Djava.io.tmpdir=/tmp
[Sat Sep 20 14:02:43 CDT 2014] net.sf.picard.sam.CreateSequenceDictionary REFERENCE=/tmp/tmp-gatk-YxuKgH/gatk_input.fasta OUTPUT=/tmp/tmp-gatk-YxuKgH/dict8496086377301159622.tmp    TRUNCATE_NAMES_AT_WHITESPACE=true NUM_SEQUENCES=2147483647 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false
[Sat Sep 20 14:02:43 CDT 2014] Executing as g2main@roundup50.tacc.utexas.edu on Linux 2.6.32-358.23.2.el6.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.7.0_40-b43; Picard version: 1.58(1057)
[Sat Sep 20 14:02:58 CDT 2014] net.sf.picard.sam.CreateSequenceDictionary done. Elapsed time: 0.25 minutes.
Runtime.totalMemory()=2553806848
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR stack trace 
java.lang.OutOfMemoryError: GC overhead limit exceeded
	at org.broad.tribble.index.linear.LinearIndexCreator.addFeature(LinearIndexCreator.java:58)
	at org.broad.tribble.index.DynamicIndexCreator.addFeature(DynamicIndexCreator.java:159)
	at org.broad.tribble.index.IndexFactory.createIndex(IndexFactory.java:136)
	at org.broad.tribble.index.IndexFactory.createIndex(IndexFactory.java:123)
	at org.broadinstitute.sting.gatk.refdata.tracks.RMDTrackBuilder.createIndexInMemory(RMDTrackBuilder.java:361)
	at org.broadinstitute.sting.gatk.refdata.tracks.RMDTrackBuilder.loadIndex(RMDTrackBuilder.java:258)
	at org.broadinstitute.sting.gatk.refdata.tracks.RMDTrackBuilder.getFeatureSource(RMDTrackBuilder.java:199)
	at org.broadinstitute.sting.gatk.refdata.tracks.RMDTrackBuilder.createInstanceOfTrack(RMDTrackBuilder.java:128)
	at org.broadinstitute.sting.gatk.datasources.rmd.ReferenceOrderedQueryDataPool.<init>(ReferenceOrderedDataSource.java:205)
	at org.broadinstitute.sting.gatk.datasources.rmd.ReferenceOrderedDataSource.<init>(ReferenceOrderedDataSource.java:85)
	at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.getReferenceOrderedDataSources(GenomeAnalysisEngine.java:783)
	at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.initializeDataSources(GenomeAnalysisEngine.java:653)
	at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:212)
	at org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:122)
	at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:236)
	at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:146)
	at org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:90)
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A GATK RUNTIME ERROR has occurred (version exported):
##### ERROR
##### ERROR Please visit the wiki to see if this is a known problem
##### ERROR If not, please post the error, with stack trace, to the GATK forum
##### ERROR Visit our wiki for extensive documentation http://www.broadinstitute.org/gsa/wiki
##### ERROR Visit our forum to view answers to commonly asked questions http://getsatisfaction.com/gsa
##### ERROR
##### ERROR MESSAGE: GC overhead limit exceeded
##### ERROR ------------------------------------------------------------------------------------------

 

After some research I learnt that my BAM file may be too big (I have SE 50M reads) so I used the downsample tool in Count Covariates to downsample all reads to fraction 0.25, so that 75% if my reads are discarded and covariates were counted on 25%. This seemed to be working and took a long time so I left it running overnight. However, this morning I returned to a similar error, specifically:

Picked up _JAVA_OPTIONS: -Djava.io.tmpdir=/tmp
[Sat Sep 20 14:41:46 CDT 2014] net.sf.picard.sam.CreateSequenceDictionary REFERENCE=/tmp/tmp-gatk-jPeTaW/gatk_input.fasta OUTPUT=/tmp/tmp-gatk-jPeTaW/dict7016368597036568962.tmp    TRUNCATE_NAMES_AT_WHITESPACE=true NUM_SEQUENCES=2147483647 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false
[Sat Sep 20 14:41:46 CDT 2014] Executing as g2main@roundup50.tacc.utexas.edu on Linux 2.6.32-358.23.2.el6.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.7.0_40-b43; Picard version: 1.58(1057)
[Sat Sep 20 14:42:00 CDT 2014] net.sf.picard.sam.CreateSequenceDictionary done. Elapsed time: 0.23 minutes.
Runtime.totalMemory()=2553806848
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR stack trace 
java.lang.OutOfMemoryError: Java heap space
	at java.util.Arrays.copyOf(Unknown Source)
	at java.util.ArrayList.grow(Unknown Source)
	at java.util.ArrayList.ensureExplicitCapacity(Unknown Source)
	at java.util.ArrayList.ensureCapacityInternal(Unknown Source)
	at java.util.ArrayList.add(Unknown Source)
	at org.broad.tribble.index.linear.LinearIndex$ChrIndex.addBlock(LinearIndex.java:161)
	at org.broad.tribble.index.linear.LinearIndexCreator.addFeature(LinearIndexCreator.java:46)
	at org.broad.tribble.index.DynamicIndexCreator.addFeature(DynamicIndexCreator.java:159)
	at org.broad.tribble.index.IndexFactory.createIndex(IndexFactory.java:136)
	at org.broad.tribble.index.IndexFactory.createIndex(IndexFactory.java:123)
	at org.broadinstitute.sting.gatk.refdata.tracks.RMDTrackBuilder.createIndexInMemory(RMDTrackBuilder.java:361)
	at org.broadinstitute.sting.gatk.refdata.tracks.RMDTrackBuilder.loadIndex(RMDTrackBuilder.java:258)
	at org.broadinstitute.sting.gatk.refdata.tracks.RMDTrackBuilder.getFeatureSource(RMDTrackBuilder.java:199)
	at org.broadinstitute.sting.gatk.refdata.tracks.RMDTrackBuilder.createInstanceOfTrack(RMDTrackBuilder.java:128)
	at org.broadinstitute.sting.gatk.datasources.rmd.ReferenceOrderedQueryDataPool.<init>(ReferenceOrderedDataSource.java:205)
	at org.broadinstitute.sting.gatk.datasources.rmd.ReferenceOrderedDataSource.<init>(ReferenceOrderedDataSource.java:85)
	at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.getReferenceOrderedDataSources(GenomeAnalysisEngine.java:783)
	at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.initializeDataSources(GenomeAnalysisEngine.java:653)
	at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:212)
	at org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:122)
	at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:236)
	at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:146)
	at org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:90)
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A GATK RUNTIME ERROR has occurred (version exported):
##### ERROR
##### ERROR Please visit the wiki to see if this is a known problem
##### ERROR If not, please post the error, with stack trace, to the GATK forum
##### ERROR Visit our wiki for extensive documentation http://www.broadinstitute.org/gsa/wiki
##### ERROR Visit our forum to view answers to commonly asked questions http://getsatisfaction.com/gsa
##### ERROR
##### ERROR MESSAGE: Java heap space
##### ERROR ------------------------------------------------------------------------------------------

 

Any idea how to fix this? 

This is pilot data so I am testing the analysis on the public Galaxy server. I can't install local galaxy as you don't support windows. I am working on having 16Gb memory installed on my laptop (from 4Gb) and then installing Linux and local galaxy for future analysis but for now this is not an option.

Thanks for your help!

ADD COMMENTlink modified 3.3 years ago by Jennifer Hillman Jackson23k • written 3.3 years ago by sekalazare0
0
gravatar for Jennifer Hillman Jackson
3.3 years ago by
United States
Jennifer Hillman Jackson23k wrote:

Hello,

The jobs appear to be exceeding available resources. If you want to continue to try to reduce the job load by sampling or adjusting parameters, the GATK forum is a great place to review specific error that are reported - or just google it - and see what others have tried. 

That said, if you just want to run the data as-is, the cloud Galaxy is an option, and is probably the best for large jobs as you can really scale up memory. Get tools from the Tool Shed (this includes updated GATK tools, too).
http://wiki.galaxyproject.org/BigPicture/Choices
http://galaxyproject.org/toolshed

Best, Jen, Galaxy team

ADD COMMENTlink written 3.3 years ago by Jennifer Hillman Jackson23k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 48 users visited in the last hour