4.2 years ago by
United States
Hi Claire,
Thanks for sending in the bug reports, helpful to examine the actual data for this one.
The dbSNP ROD file used in the analysis is lacking the full annotation required by the pipeline. In particular, lines that define the genome content are missing (triggering the empty array error you encountered). Lines such as:
##contig=<ID=1,length=249250621,assembly=b37>
|
##contig=<ID=10,length=135534747,assembly=b37>
|
##contig=<ID=11,length=135006516,assembly=b37>
|
|
A better choice for a dbSNP ROD dataset is the one provided with the GATK bundle. You can obtain this directly from the Broad, or when working on the public Main Galaxy instance, use the copy in the 'Shared Data -> Data Library -> GATK" bundle datasets.
As a secondary issue, I believe it is important to use the dbSNP vcf file directly in any analysis (instead of prior processed vcf datasets, as I saw in one of your runs). If you would like to merge vcf files later, there is a tool for that: NGS: VCF Manipulation -> VCFcombine
Hopefully this helps resolve your issue, and helps guide to others that are learning GATK pipeline about the resources available on Main (http://usegalaxy.org). If our team has more to add specific to you situation, we will comment again and/or reply via email.
Take care, Jen, Galaxy team