I'm trying to run HaplotypeCaller from GATK4 in GVCF mode using this wdl https://github.com/gatk-workflows/gatk4-germline-snps-indels/blob/master/haplotypecaller-gvcf-gatk4.wdl on my computer. This wdl script makes use of GATK in a docker containers to execute GATK tools such as HaplotypeCaller, and MergeGVCF. I'm using Cromwell in "run mode" to run the wdl script. I'm running it locally, with the exact inputs listed in the haplotypecaller-gvcf-gatk4.hg38.wgs.inputs.json file.
However, I keep getting out-of-memory errors. It seems like 50 GATK docker containers are getting spun up and run HaplotypeCaller in parallel. This is due to the number of interval lists declared in hg38_wgs_scattered_calling_intervals.txt.
I'm running it on a machine with 32G of RAM and 512GB of disk space. This seems to be more than enough, no? My questions are basically:
- How much RAM/memory is needed to run this workflow?
- How much disk space is needed?
- Should I set a limit on how much memory each docker container can use in the Cromwell configuration file, and if so, how much should I set it to?
- In general, how much memory does HaplotypeCaller and MergeGVCF need to run?
- What should the Java heap size be set to?
Any insight would be greatly appreciated. I've been struggling with this for weeks. Thank you!
Hello crisagazzola!
We believe that this post does not fit the main topic of this site.
This forum focuses on Galaxy usage. The correct support contact for this question is: https://gatkforums.broadinstitute.org/gatk
For this reason we have closed your question. This allows us to keep the site focused on the topics that the community can help with.
If you disagree please tell us why in a reply below, we'll be happy to talk about it.
Cheers!