I have 256 Recombinant Inbred Lines (RILS) in F7 generation. I have VCF files for RILS, mother and father. I would like to filter SNP's which are heterozygous with each parents. How can I do in Galaxy. Thank you
I often want do do the same thing but I slightly change the question to be ' I want to remove all sites where both parents are probable heterozygotes'
It could be that parent A is called a confident 0/1 but parent B is a low quality 0/0 ( meaning that it could well be 0/1).
I use SNPsift which has very good documentation available . The program is installed on usegalaxy.org (though I am not sure if it is the latest version).
I use a command like this in SNpsift to filter the VCF
! ((GEN[5].GL[1] > -4 ) & ( GEN[6].GL[1] > -4 ))
I think you can figure out what it does from the documentation. But these are some notes ! = not ( get rid of sites which match from output .vcf) & = in this case to only if both statements are true will they be got rid of i.e. both parents are potential heterozygotes. GEN[5]. = one parent is the 6th sample in the VCF ( the first sample is 0 the second is 1...). the other parent is the 7th in the .vcf file. GL[1] = in the genotype likelihood (GL) field the 2nd number is the likelihood of being heterozygous (the first is numbered 0 and is the GL 0/0, GL : 0, 1, 2:) '>-4 ' = if the value of GL likelihood being heterozygous is greater that this I consider it a questionable heterozygote, this is my rule of thumb, it will also depend on how your GL is scaled (mine is from using FreeBayes). So need to establish this yourself. other programs may use GP instead of GL
To test get rid of the ! and you should generate a file with sites where both parents are potential heterozygotes. Hope this helps
Guy
since writing this I looked on Useglaxy.org and see that SNPsift is no longer there, which is a shame as it is a really useful program. I know in the distant past (https://biostar.usegalaxy.org/p/14003/ )there was an issue with it not 'behaving' in the past as the documentation indicated, but this was due to an old version being on usegalaxy. I have a current version on my galaxy instance and it works great. I think it should be put back on Usegalaxy.org ! Cheers
Hello,
Please see these resources:
- https://github.com/nekrut/galaxy/wiki/Diploid-variant-calling
- https://wiki.galaxyproject.org/Learn/GalaxyNGS101#Finding_variants
- https://wiki.galaxyproject.org/Learn#Other_Tutorials
Thanks! Jen, Galaxy team