I often want do do the same thing but I slightly change the question to be ' I want to remove all sites where both parents are probable heterozygotes'
It could be that parent A is called a confident 0/1 but parent B is a low quality 0/0 ( meaning that it could well be 0/1).
I use SNPsift which has very good documentation available . The program is installed on usegalaxy.org (though I am not sure if it is the latest version).
I use a command like this in SNpsift to filter the VCF
! ((GEN.GL > -4 ) & ( GEN.GL > -4 ))
I think you can figure out what it does from the documentation. But these are some notes ! = not ( get rid of sites which match from output .vcf) & = in this case to only if both statements are true will they be got rid of i.e. both parents are potential heterozygotes. GEN. = one parent is the 6th sample in the VCF ( the first sample is 0 the second is 1...). the other parent is the 7th in the .vcf file. GL = in the genotype likelihood (GL) field the 2nd number is the likelihood of being heterozygous (the first is numbered 0 and is the GL 0/0, GL : 0, 1, 2:) '>-4 ' = if the value of GL likelihood being heterozygous is greater that this I consider it a questionable heterozygote, this is my rule of thumb, it will also depend on how your GL is scaled (mine is from using FreeBayes). So need to establish this yourself. other programs may use GP instead of GL
To test get rid of the ! and you should generate a file with sites where both parents are potential heterozygotes. Hope this helps
Please see these resources:
Thanks! Jen, Galaxy team