Question: Filtering on tabular dataset from vcf
0
gravatar for nbhassou
3.1 years ago by
nbhassou10
United States
nbhassou10 wrote:

Hello,

I tab delimited a vcf file so that I could filter on the 'type' information, which was included in a series of information separated by semicolons. I now want to filter for lines that contain 'TYPE=snp'. Some of the types includes 'TYPE=snp, del', or another combination of types of variants. I want all lines that include snp in it, including the ones with other types. How can I filter in this way? I have only figured out how to filter for the exact expression of 'TYPE=snp'. 

Thanks

vcf string filter • 1.2k views
ADD COMMENTlink modified 3.1 years ago by judyh0 • written 3.1 years ago by nbhassou10
1
gravatar for Bjoern Gruening
3.1 years ago by
Bjoern Gruening5.1k
Germany
Bjoern Gruening5.1k wrote:

Hi,

you can use specialised tools like "VCFfilter: filter VCF data in a variety of attributes".

Hope this helps,

Bjoern

ADD COMMENTlink written 3.1 years ago by Bjoern Gruening5.1k
1
gravatar for nbhassou
3.1 years ago by
nbhassou10
United States
nbhassou10 wrote:

Hi judyh, 

I first had to tab delimit the vcf file. Then I actually filtered with the regular filter tool. To find lines with mnp, for example, I used the select option instead of filter option. I couldn't figure out the VCF filter tool actually, sadly enough. 

ADD COMMENTlink written 3.1 years ago by nbhassou10
0
gravatar for judyh
3.1 years ago by
judyh0
United States
judyh0 wrote:

Hello, nbhassou, congratulations on getting the TYPE=snp lines filtered. How did you manage it? I keep getting error messages.

Obviously you are already using the specialized tools like "VCFfilter: filter VCF data in a variety of attributes". Since you know how to get TYPE=snp, can you get the lines with additional types using the AND and OR options? Maybe snp OR mnp OR del and so forth? Just an idea. Don't know if it will work. I can't even get the SNPs, so I'm no expert.

ADD COMMENTlink written 3.1 years ago by judyh0
0
gravatar for judyh
3.1 years ago by
judyh0
United States
judyh0 wrote:

Hello, nbhassou,

Thanks for the idea. I was so exasperated from days of hunting through the tools that I converted the vcf file to tab-delimited format and downloaded it to my computer to do the job in Excel. I'll have another try at doing it with the vcf filter tomorrow, when I have more patience. I'll post here if I can figure it out. It is worth knowing how to do it.

It is unfortunate that the Coursera teacher can't be bothered to teach his class properly. I expect the people on Biostars get tired of teaching his class for him every time the class project comes around. The situation is far from ideal, but edX.org does not yet offer this type of course, so here we all are.  

ADD COMMENTlink written 3.1 years ago by judyh0
0
gravatar for nbhassou
3.1 years ago by
nbhassou10
United States
nbhassou10 wrote:

Hey, yeah the vcf filter isn't as intuitive. I feel your pain- I wanted to bring it to excel to figure out how to sort, but finally figured it out. You should definitely provide feedback for the course when evals come around. I felt a little in the dark too. Hopefully the project I submit is correct enough. 

ADD COMMENTlink written 3.1 years ago by nbhassou10
0
gravatar for judyh
3.1 years ago by
judyh0
United States
judyh0 wrote:

Okay, Filter and Sort > Select and then type TYPE=snp (or whichever variant you want) in the box.

ADD COMMENTlink written 3.1 years ago by judyh0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 183 users visited in the last hour