Hi all,
I'm having some problems with the output of the GATK unified genotyper.
Essentially I put through a series of positive controls to check the validity of the output, in said output there were no SNPs or indels at the sites of known genetic variation for these samples. At no point earlier in the history did I provide galaxy with any known sites of genetic variation, and I didn't use a DBSNP file until the unified genotyper step. There were no error messages throughout the workflow, so everything seemed fine, and the data looked normal, except for the fact that the mutations I know to be present in that individual were not present in the output.
It seems to me that this would probably not be due to the DBSNP file, as my understanding is that the purpose of this file is purely to mark known SNPs/Indels. By that logic it seems to me that if there was a problem with the DBSNP file there would be an SNP/Indel at the correct chromosome and read position still it just wouldn't have an rs number, however this was not the case, and there were no mutations in the affected genes in two of my three controls.
I've tried re-running these with hg-38 but I can't get past the local realignment stage.
Is this likely to be a problem with galaxy - or a problem with the raw data that I have?
Any thoughts/ideas are very much appreciated,
Henry