HI Tod
I guess you have worked this out.
I had similar questions about syntax I found this post by the authour of the tool, while it is for the command line version it shows the syntax and how using the 'VCFfixup' tool (also in Galaxy) you can manipulate VCF files within galaxy.
To be honest I have not yet figured out how to use the '-o' option
https://www.biostars.org/p/51439/
'
ou can do exactly this with vcffilter in vcflib!
Here's how to select all variants with depth greater than 10, mapping quality greater than 30, and QD greater than 20:
vcffilter -f "DP > 10 & MQ > 30 & QD > 20" file.vcf >filtered.vcf
Now, to select only variants with homozygotes, you can strip every genotype that's not homozygous, fix up the file's AC and AF fields using the genotypes with vcffixup, and then remove all the AC = 0 sites (again, using vcffilter).
cat filtered.vcf | vcffilter -g "GT = 1/1" | vcffixup - | vcffilter -f "AC > 0" >results.vcf
The expression language is clunky (you have to put spaces in between the tokens, and parenthetical expressions also have to have spaces). There is also no != symbol, but as a workaround you can do ! ( expression ).
For instance, to pick up non-homozygous genotypes, you'd use:
vcffilter -g "! ( GT = 1/1 )"
I'd like to fix some of these things (and also add regex matching for strings) but this far it more than does the job for quick filtering operations, allowing me to do virtually any kind of filtering from the command line without having to drop into writing a custom script.
These are the supported operations: > < = | & !, and symbols: ( ). Strings are interpreted literally. There is some type checking using the VCF header, so you have to have a valid VCF file. The output is a valid VCF file, so you can stream the filter results into another filtering operation.
ADD COMMENT • linkwritten 3.0 years ago by Erik Garrison • 1.6k
Note that this will work for any values in the INFO field or per-sample fields.
ADD REPLY • linkwritten 3.0 years ago by Erik Garrison • 1.6k
'