Could anyone tell me how to convert a VCF dataset into Interval dataset on main galaxy?
Thanks!!!
Could anyone tell me how to convert a VCF dataset into Interval dataset on main galaxy?
Thanks!!!
Hi,
In most cases, it is better to work with the data in VCF format. Tools in the group NGS: VCF Manipulation can compare VCF to BED and other data formats, if the goal is to look for overlaps or to link to annotation.
However, if you want to do this anyway, try the steps below. Note this will only work for single-nucleotide-polymorphisms and not multiple-nucleotide-polymorphisms. The tools referenced below are in the group Text Manipulation unless stated differently.
First, use the tool NGS: VCF Manipulation > VCFtoTab-delimited: Convert VCF data into TAB-delimited format.
From there you will need to remove the header (tool: Remove beginning of a file), create a start coordinate from the Pos field (tool: Compute and subtract "1"), change the datatype to interval
, and at the end assign the chromosome, start (the new column from Compute), and end columns (original Pos column).
Cut can be used to rearrange/reduce the columns if wanted. And if you need strand, all the data in VCF files are with respect to the "+" strand. You can add in a "+" column to all result lines with the tool Add column.
FAQs: https://galaxyproject.org/support/#getting-inputs-right
Galaxy Tutorials: https://galaxyproject.org/learn/
Thanks! Jen, Galaxy team
I am doing variant calling on my data and needed an interval file for several analysis in variant calling as input that's why i was stuck there. But now i have done with it by following your rejoinder thanks a lot.