Question: Split VCF file
gravatar for elainelewis1
7 months ago by
elainelewis10 wrote:

I would like help splitting a large vcf file containing WGS data into smaller files. Using Galaxy filters, I split it into smaller files by chromosome, but now they are tabular, not vcf files. Could you let me know how to either convert tabular back to vcf, or how to split the original vcf without changing file format?

format vcf split galaxy • 449 views
ADD COMMENTlink modified 7 months ago by Jennifer Hillman Jackson25k • written 7 months ago by elainelewis10
gravatar for Wolfgang Maier
7 months ago by
Wolfgang Maier600 wrote:

I guess your problem comes from the header lines, which are present in VCF and start with a #. If your filter does not keep them, then the result is no longer vcf, but just general tabular data. Possible solutions:

1) Use the Select lines that match an expression tool with a regular expression that matches lines starting with either a # or one of your chromosome names followed by arbitrary characters, e.g., ^#|chrI.+

2) Break your problem into simpler subtasks: filter your dataset once with a filter that keeps only the header lines (based on them starting with a #), then join each of your tabular single-chromosome datasets to this headr-only dataset thereby regenerating valid VCF format.

3) Most direct (but less instructive): try to use the MiModD VCF Filter tool with appropriate Region Filters.

ADD COMMENTlink written 7 months ago by Wolfgang Maier600
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 180 users visited in the last hour