I have inherited a project where the original VCF files were lost. I have the excel files derived from the VCF files for normal and tumor samples. I would like to compare the variants in these. Can I do this in Galaxy? I don't know of any variant manipulation packages that accept excel files.
Converting the data back into VCF format would be the way to use the data analysis in Galaxy.
I don't know of any specific tools that automatically convert Excel-to-VCF. You are unlikely to find a script/command-line that would do this, without some tuning, as the format during the VCF-to-Excel transfer can be manipulated in many custom ways.
What you can do:
- Exporting the data into tabular format would be the start.
- Then try to manipulate it from tabular back into VCF format.
- Review the data manipulation tools in Galaxy. Look in the Text manipulation and Datamash tool groups.
- Edit the file yourself with a text editor, unix or other.
- Search general bioinformatics websites for tips, shared scripts, etc.
- The Galaxy Main Tool Shed https://usegalaxy.org/toolshed does have some tools for working with Excel data, but none do the operation you want. Search with the keyword "excel" to find/review these. If any do seem useful -- to you or others reading -- the tools are for use in your own Galaxy (not hosted at the public Galaxy Main https://usegalaxy.org site). https://galaxyproject.github.io/
FAQ that includes a link to the current VCF specification: https://galaxyproject.org/learn/datatypes/#vcf
Hope this works out! Jen, Galaxy team