Question: Filtering Variants using Snpsift filter and SnpEff
0
gravatar for braveen.joseph
22 months ago by
braveen.joseph20 wrote:

Hi,

I have a bunch of variants in .VCF file which I got through the unified genotyper tool. Now I want to filter only the "Homozygous" (100% alternate allele frequency) variants. But the Snpsift filter and SnpEff tools do not have any option to change what on basis the variants are annotated as "Homozygous" and Heterozygous". At least is there are way to filter out the variants based on frequency of alternate non-reference alleles.

In my case I need only the variants with 100% of alternate non-reference alleles. Can someone help me on this?. It would be a great help.

Thanks in Advance.

snp galaxy • 1.9k views
ADD COMMENTlink modified 22 months ago by Guy Reeves1.0k • written 22 months ago by braveen.joseph20
0
gravatar for Guy Reeves
22 months ago by
Guy Reeves1.0k
Germany
Guy Reeves1.0k wrote:

I find it hard to follow what you are trying to do. Do you want to output a file with only those sites were all samples are 1/1? If this is what you want to do this try ((countRef() < 1) AND (countHet() <1)) To be honest I have not tested it (and I can see that any sites with missing data would make it into the output file). Commands explained on http://snpeff.sourceforge.net/SnpSift.html

ADD COMMENTlink modified 22 months ago • written 22 months ago by Guy Reeves1.0k
1

Hi,

Thanks a lot for the reply. I think what you said above is working. What i actually wanted is... let's say there are 30 reads for a nucleotide site in the WGS alignment. If a mutation is present on a site, I wanted only the sites where the mutation is present in all 30 reads in my output file (100% alternate allele frequency). i tested your code and cross checked few variants with the alignment. It looks good.

Still i'm finding it really hard to understand the documentation of the tool (The code). Will you be able to explain brifley in layman terms. Thanks again for the great Help.

ADD REPLYlink modified 22 months ago • written 22 months ago by braveen.joseph20
0
gravatar for Guy Reeves
22 months ago by
Guy Reeves1.0k
Germany
Guy Reeves1.0k wrote:

HI

(countRef() < 1)= keep site if the the total number of samples with 0/0 is less that 1 ie zero. (countHet() <1)= keep site if the the total number of samples with 0/1 is less that 1 ie zero. (AND ) is used ensure that only sites where there are no 0/0 AND no 0/1 ie sites with only 1/1
If by '30 reads' you mean 30 samples. You could also filter out sites with any missing data by

((countRef() < 1) AND (countHet() <1) AND (countHom() > 29))

Cheers Guy

ADD COMMENTlink written 22 months ago by Guy Reeves1.0k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 169 users visited in the last hour