Question: VCFfilter: combining 2 expressions using OR or AND.
1
gravatar for mohamed_refaat.1992
3.4 years ago by
Egypt
mohamed_refaat.199210 wrote:

Updated question

Hi,

I'm trying to filter an vcf file using VCFfilter. I want to know how to combine 2 filters together. To make it clearer, i want to use -f"TYPE=del' and -f"TYPE=ins", but i don't know  how to combine them correctly.

Can anyone help me in this issue ?
Thanks in advance.

Mohamed

ADD COMMENTlink modified 18 months ago by Jennifer Hillman Jackson25k • written 3.4 years ago by mohamed_refaat.199210
3
gravatar for sanchezgil.juanjose
20 months ago by
sanchezgil.juanjose30 wrote:

Hi, all. I know this is an old question, but I wanted to leave here the answer so new people that come up with the problem can solve it. As documentation of VCFfilter says:

-o, --or              use logical OR instead of AND to combine filters

so state an OR in the filter is unneeded. This is, by default it will combine all filters provided with an AND, and if an OR is desired you only have to set the option -o/--or. In this case your error not only was the lowercase in 'and' (BTW, it's also unneeded) but also that you're filtering by the variants of type 'del' AND 'ins' at the same time, which is impossible and thus you get an error.

To get all indels, the correct command should be something like:

-o -f 'TYPE = del' -f 'TYPE = ins'

...or...

--or -f 'TYPE = del' -f 'TYPE = ins'

Hope it helps :)

ADD COMMENTlink written 20 months ago by sanchezgil.juanjose30
1
gravatar for Guy Reeves
3.0 years ago by
Guy Reeves1.0k
Germany
Guy Reeves1.0k wrote:

HI Tod 

I guess you have worked this out.  

I had similar questions about syntax I found this post by the authour of the tool, while it is for the command line version it shows the syntax and how using the 'VCFfixup' tool (also in Galaxy) you can manipulate VCF files within galaxy.

To be honest I have not yet  figured out how to use the '-o' option

https://www.biostars.org/p/51439/

'

ou can do exactly this with vcffilter in vcflib!

Here's how to select all variants with depth greater than 10, mapping quality greater than 30, and QD greater than 20:

vcffilter -f "DP > 10 & MQ > 30 & QD > 20" file.vcf >filtered.vcf

Now, to select only variants with homozygotes, you can strip every genotype that's not homozygous, fix up the file's AC and AF fields using the genotypes with vcffixup, and then remove all the AC = 0 sites (again, using vcffilter).

cat filtered.vcf | vcffilter -g "GT = 1/1" | vcffixup - | vcffilter -f "AC > 0" >results.vcf

The expression language is clunky (you have to put spaces in between the tokens, and parenthetical expressions also have to have spaces). There is also no != symbol, but as a workaround you can do ! ( expression ).

For instance, to pick up non-homozygous genotypes, you'd use:

vcffilter -g "! ( GT = 1/1 )"

I'd like to fix some of these things (and also add regex matching for strings) but this far it more than does the job for quick filtering operations, allowing me to do virtually any kind of filtering from the command line without having to drop into writing a custom script.

These are the supported operations: > < = | & !, and symbols: ( ). Strings are interpreted literally. There is some type checking using the VCF header, so you have to have a valid VCF file. The output is a valid VCF file, so you can stream the filter results into another filtering operation.

ADD COMMENT • linkwritten 3.0 years ago by Erik Garrison • 1.6k

Note that this will work for any values in the INFO field or per-sample fields.

ADD REPLY • linkwritten 3.0 years ago by Erik Garrison • 1.6k

'

ADD COMMENTlink modified 3.0 years ago • written 3.0 years ago by Guy Reeves1.0k
0
gravatar for Jennifer Hillman Jackson
3.3 years ago by
United States
Jennifer Hillman Jackson25k wrote:

Hello,

The expression would be entered as:

expression_1 AND expression_2

Where AND is the operator to combine filters. The full description for this option is in the help text for the tool, about half way down.

Hopefully this helps, but please let us know if you continue to have problems. Jen, Galaxy team

ADD COMMENTlink written 3.3 years ago by Jennifer Hillman Jackson25k

Could you please give an example? This does not help at all. What you stated doesn't work on Galaxy. In fact I tried all possible ways for using OR operator
For example:
--or -f "QUAL = 30" AND -f "QUAL > 30" (error)
--or -f "QUAL = 30" AND "QUAL > 30" (error)
--or -f "QUAL = 30" OR -f "QUAL > 30" (error)
--or -f "QUAL = 30" OR "QUAL > 30" (error)
 

and I'm trying to get simple thing, filter variants on QUAL >= 30 

I'm not sure what you meant by  expression_1 AND expression_2

This doesn't help at all. Please give example or at least tell me how to solve my problem

 

ADD REPLYlink modified 3.3 years ago • written 3.3 years ago by rahilsethi0

Hi,  Pulling my hair out.  Taking the out-of-sequence JHU Coursera course re Galaxy in the Genomic Science set.  I'm generally plenty knowledgable about mol bio, genetics/genomics.  I am trying to filter public VCF files re genomic variants to isolate only those on a particular chromosome.  I see the annotation re how to do this (-r), but the proper syntax is not described (use "in conjunction with" -f or -g)!  "In conjunction with"?.  BTW, if one can filter based on QUAL, why not a filter based on CHROM? using a similar coding strategy?  In any case, please, please, please let me know the proper syntax to use to filter VCF records based on chromosome.  I tried to use -r "in conjunction with" a -f filter that would capture all records (i.e. a filter that does not filter) but failed many, many times.  The "course" TAs are useless.  Reminder: it is the JHU folks that insist on the Galaxy course before Command Line instruction.  And, of course, I may just be an idiot, but please give me the benefit of the doubt - but give me the precise syntax.

Thank you, Tod

ADD REPLYlink written 3.1 years ago by tod.gulick0

Hi Tod (and others that encounter the same issue),

Please see the help offered by sanchezgil.juanjose above. Although this was for line-command usae, the usage in Galaxy generally has help that links line-command options to tool form options and also provides syntax help. These sometimes differ between tool form free-text input content and what would be used line-command.

As an aside, the tool itself may have had a corner-case problem when the question was originally asked but has been updated several times since. Next time, a bug report can be submitted for errors. Full analysis context is often needed to accurately diagnose and provide troubleshooting help.

Thanks! Jen, Galaxy team

ADD REPLYlink modified 18 months ago • written 18 months ago by Jennifer Hillman Jackson25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 182 users visited in the last hour