Hello,
The VCF file output from VCF Combine only has data up until Chromosome 3. I checked the vcf file output from all the other steps I ran up until VCF Combine and the full data (all the chromosomes) was there, so it seems like this tool in particular is eating my data. I tried to run VCFsort on all my VCF files and then try to combine them again (a tip that was given on this forum around a year and a half ago), but the output is the same. What is going on?!
I also saw it suggested to use the Combine Variants tool instead but I'm not sure if this tool achieves the same result and which reference file I'm supposed to input.
Thanks!
Hi - There might be another tool problem, or the tool is simply running out of resources and not failing correctly.
I would suggest putting all of the VCFs into a Dataset Collection and to try running the tool that way.
I do see a dbSnp reference used in earlier steps that appears to be incomplete (dataset 238) but I don't think that is contributing to the problems (yet). It might have an impact on data content that you will care about at some point though (unless it was intentionally restricted to certain chromosomes).
Avoid the GATK Combine Variants tool - it has been deprecated for some time now and could fail or produce unexpected results.
I'm looking more into the problem of truncated results from VCF Combine. Feedback once done.
Thanks again for reporting problems! Jen, Galaxy team
Prior related Q&A: https://biostar.usegalaxy.org/p/29608/
Hi Jennifer,
Thank you for your prompt and thorough response!
I tried putting the VCFs into a data collection and running it that way but it failed again :( Do you have any other suggestions? I am stuck on how to merge these VCF files!
Best, Kelsey
A collection didn't make a difference for me either. I ran through a series of tests today and sort order seems to be the primary issue. VCFsort will not resolve it. More feedback tomorrow.