Question: Filtering for SNP genotype
0
gravatar for d.angra
3.7 years ago by
d.angra50
United Kingdom
d.angra50 wrote:

I am a relatively new to galaxy. I have so far succeeded in using galaxy to discovery SNPs using SAM tools with my datasets with frequent help from galaxy team. Now that I have generated about 1 lakh SNPs I want to select only the ones from the set which have genotype specification 0/1 and hence heterozygotes only. Is there any tool which can help me select desired SNPs?

Your help will be greatly appreciated.

samtools • 1.2k views
ADD COMMENTlink modified 3.5 years ago • written 3.7 years ago by d.angra50

Hello Mark,

Thankyou very much. I am trying this now.

 

Viva

ADD REPLYlink written 3.5 years ago by d.angra50
1
gravatar for Mark Crowe
3.6 years ago by
Mark Crowe100
QFAB, Brisbane
Mark Crowe100 wrote:

Hopefully someone else might come up with a more elegant solution, but a quick hack to do this might simply be to filter the VCF files for "0/1" at the beginning of column 10. In theory, you should be able to do this using the Filter tool with a filter condition of:

c10.split(":")[0]=="0/1"

This splits column 10 on the : character, and then only returns lines with 0/1 as the first string in that separation (first being index position zero, hence the [0])

But in testing, I've found some VCF files that this doesn't work for (possibly a bug in the filter or split function). If you get that, you could try the even cruder approach, using the Select tool to search for 

\t0/1

This just looks for the string 0/1 immediately after a tab character (i.e. at the beginning of a column field), and seems to work reasonably well

ADD COMMENTlink written 3.6 years ago by Mark Crowe100
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 172 users visited in the last hour