Question: False negative results in GATK unified genotyper output
2.3 years ago by
European Union
henryrobins30 wrote:

Hi all,

I'm having some problems with the output of the GATK unified genotyper. 

Essentially I put through a series of positive controls to check the validity of the output, in said output there were no SNPs or indels at the sites of known genetic variation for these samples. At no point earlier in the history did I provide galaxy with any known sites of genetic variation, and I didn't use a DBSNP file until the unified genotyper step. There were no error messages throughout the workflow, so everything seemed fine, and the data looked normal, except for the fact that the mutations I know to be present in that individual were not present in the output.

It seems to me that this would probably not be due to the DBSNP file, as my understanding is that the purpose of this file is purely to mark known SNPs/Indels. By that logic it seems to me that if there was a problem with the DBSNP file there would be an SNP/Indel at the correct chromosome and read position still it just wouldn't have an rs number, however this was not the case, and there were no mutations in the affected genes in two of my three controls.

I've tried re-running these with hg-38 but I can't get past the local realignment stage.

Is this likely to be a problem with galaxy - or a problem with the raw data that I have?

Any thoughts/ideas are very much appreciated,


ADD COMMENTlink modified 2.3 years ago by Jennifer Hillman Jackson23k • written 2.3 years ago by henryrobins30
2.3 years ago by
United States
Jennifer Hillman Jackson23k wrote:


There are many factors that can influence these results, so a definite answer is not possible with the given information. However, there are certainly ways that the analysis can be modified and/or compared to other tools to determine what is going on.

I would first review the "Basic or Advanced Analysis options" and make certain that you have used parameters that permit discovery, especially if the sites you are looking for are not in dbSNP. Other thresholds (various quality filters and such) can also be reviewed and set to be less strict. This will help if the variation is shallow or of lower quality. 

Apart from that, have you compared the Galaxy results to the command-line tool version results? Or if you just want to use the public server at, compared these results to what other variation tools would call? (Mpileup, Naive Variant Caller, FreeBayes, etc.)? This type of testing will help you to understand how the settings for each tool can impact identification of results. 

Hopefully this helps! Jen, Galaxy team


ADD COMMENTlink written 2.3 years ago by Jennifer Hillman Jackson23k
