Question: VCF gene name sorting
3.1 years ago by
United States
I have a vcf file and need to tie it to the gene name. I did this with the vcf annotate tool using the hg19 genome as the BED file. The gene name is in the info field of the vcf file. But now I need to find the gene with the most polymorphisms. I don't know how to do this without delimiting first, but when I try doing this the convert tool ends up just lining up the text in each cell back to back, and it is no longer in a table-like format. How can I get around this?




gene name vcf • 1.3k views
3.1 years ago by
United States
Are you converting the VCF to tabular format with the tool VCFtoTab-delimited on I just retested all options and it seems to be parsing out VFC fields distinctly without problems. 

Maybe post a snippet of a few of the lines and we can see if there is an obvious format problem? Or you can share a history and send that link to Include a link to this post and note which datasets are the problem (just so we are looking at the same thing).

Thanks! Jen, Galaxy team

3.1 years ago by
United States
This is an example of what I am getting: 

fileformat=VCFv4.1 ##fileDate=20151018 ##source=freeBayes v0.9.20 ##reference=/galaxy/data/hg19/sam_index/hg19.fa ##phasing=none ##commandline="freebayes --bam localbam_0.bam --fasta-reference /galaxy/data/hg19/sam_index/hg19.fa --vcf /galaxy-repl/main/fil

This occurred after trying to join two tabular files, but I see the same thing when trying to tab delimit on the annotated vcf. It is bold, larger font and with no order. 

