Question: VCF Combine, missing chromosomes?!
gravatar for kelseyca
7 weeks ago by
kelseyca0 wrote:


The VCF file output from VCF Combine only has data up until Chromosome 3. I checked the vcf file output from all the other steps I ran up until VCF Combine and the full data (all the chromosomes) was there, so it seems like this tool in particular is eating my data. I tried to run VCFsort on all my VCF files and then try to combine them again (a tip that was given on this forum around a year and a half ago), but the output is the same. What is going on?!

I also saw it suggested to use the Combine Variants tool instead but I'm not sure if this tool achieves the same result and which reference file I'm supposed to input.


variants galaxy vcf combine • 127 views
ADD COMMENTlink modified 5 weeks ago by Jennifer Hillman Jackson25k • written 7 weeks ago by kelseyca0

Hi - There might be another tool problem, or the tool is simply running out of resources and not failing correctly.

I would suggest putting all of the VCFs into a Dataset Collection and to try running the tool that way.

I do see a dbSnp reference used in earlier steps that appears to be incomplete (dataset 238) but I don't think that is contributing to the problems (yet). It might have an impact on data content that you will care about at some point though (unless it was intentionally restricted to certain chromosomes).

Avoid the GATK Combine Variants tool - it has been deprecated for some time now and could fail or produce unexpected results.

I'm looking more into the problem of truncated results from VCF Combine. Feedback once done.

Thanks again for reporting problems! Jen, Galaxy team

Prior related Q&A:

ADD REPLYlink modified 6 weeks ago • written 6 weeks ago by Jennifer Hillman Jackson25k

Hi Jennifer,

Thank you for your prompt and thorough response!

I tried putting the VCFs into a data collection and running it that way but it failed again :( Do you have any other suggestions? I am stuck on how to merge these VCF files!

Best, Kelsey

ADD REPLYlink written 6 weeks ago by kelseyca0

A collection didn't make a difference for me either. I ran through a series of tests today and sort order seems to be the primary issue. VCFsort will not resolve it. More feedback tomorrow.

ADD REPLYlink written 6 weeks ago by Jennifer Hillman Jackson25k
gravatar for Jennifer Hillman Jackson
5 weeks ago by
United States
Jennifer Hillman Jackson25k wrote:


Ok, I finally figured out the problem. It is required that the VCF's have a ##contig=<ID=chrN,length=NNNNNNN> type of header, one line for each reference chromosome, or the resulting combine will only consider the first chromosome that happens to be in the VCFs. Your uploaded VCFs did not include any header ##contig lines.

If you instead run the variant calling in Galaxy, using the latest Freebayes, there is an option to combine all VCFs at that same step when multiple BAMs are input (batch mode).

The VCFtools suite is getting a bit outdated and the wrappers have not been updated in some time (and probably won't be going forward). I would suggest avoiding these tools if at all possible -- most if not all of the prior functions can be done with more current tools. Definitely avoid VCFfilter and use the SnpSift filter instead (the prior doesn't work well with the newer VCF format). I would also suggest putting the data into a collection to better keep track of all the inputs/outputs throughout the analysis.

Thanks for your patience while I tested! Jen, Galaxy team

ADD COMMENTlink written 5 weeks ago by Jennifer Hillman Jackson25k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 180 users visited in the last hour