I have whole genome sequence from Saccharomyces cerevisiae strains and I'm looking for their variations from refseq. The reads have been worked up to .vcf files using Varscan and also FreeBayes on the useGalaxy web interface. They view nicely in IGB. I'm stuck at how one gets an excel list of the collection of variants with annotations for snps (or indels, etc)? Any videos, tutorials, comments appreciated. Thank you.
Annotations can be associated with the tools Annovar, SnpEff, VCF-BEDintersect, and other tools such as Gemini.
VCF format can be transformed to tab-delimited format with the tool VCFtoTab-delimited then download to import into Excel.
Galaxy tutorials: https://galaxyproject.org/learn/
Thanks! Jen, Galaxy team
VCFtoTab-delimited stopped working. Here are the results for converting a VCF file into tabular: Here is my vcf file:
Chrom Pos ID Ref Alt Qual Filter Info Format data ##fileformat=VCFv4.3 ##fileDate=20180725 ##source=Naive Variant Caller version 0.0.4 ##reference=file:///galaxy-repl/main/files/026/335/dataset_26335075.dat ##INFO=<ID=AC,Number=A,Type=Integer,Description="Allele count in genotypes, for each ALT allele, in the same order as listed"> ##INFO=<ID=AF,Number=A,Type=Float,Description="Allele Frequency, for each ALT allele, in the same order as listed"> ##INFO=<ID=SB,Number=1,Type=Float,Description="Strand Bias"> ##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype"> ##FORMAT=<ID=AC,Number=.,Type=Integer,Description="Allele count in genotypes, for each ALT allele, in the same order as listed"> ##FORMAT=<ID=AF,Number=.,Type=Float,Description="Allele Frequency, for each ALT allele, in the same order as listed"> ##FORMAT=<ID=SB,Number=1,Type=Float,Description="Strand Bias"> ##FORMAT=<ID=NC,Number=.,Type=String,Description="Nucleotide and indel counts"> #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT __NONE__ #99-REM_selection 1 . C T,G,A,N,CAACCTCCCCTTCTACGAGCACAGC . . AC=206,118,41,15,1;AF=0.000918015838001,0.000525853732447,0.000182711890088,6.68458134467e-05,4.45638756311e-06;SB=1082.20289411 GT:AC:AF:SB:NC 0:206,118,41,15,1:0.000918015838001,0.000525853732447,0.000182711890088,6.68458134467e-05,4.45638756311e-06:1082.20289411:+A=41,+C=224015,+G=118,+N=15,+CAACCTCCCCTTCTACGAGCACAGC=1,+T=206,-C=1, #99-REM_selection 2 . A C,G,T,N . . AC=374,31,16,16;AF=0.00166371586936,0.000137901582754,7.11750104538e-05,7.11750104538e-05;SB=598.295995562 GT:AC:AF:SB:NC 0:374,31,16,16:0.00166371586936,0.000137901582754,7.11750104538e-05,7.11750104538e-05:598.295995562:+A=224360,+C=374,+T=16,+G=31,+N=16,-A=1, #99-REM_selection 3 . A C,G,T,AC . . AC=105,22,17,17;AF=0.000466969678102,9.78412658881e-05,7.56046145499e-05,7.56046145499e-05;SB=2119.74527861 GT:AC:AF:SB:NC 0:105,22,17,17:0.000466969678102,9.78412658881e-05,7.56046145499e-05,7.56046145499e-05:2119.74527861:+A=224692,+C=105,+AC=17,+G=22,+T=17,-A=1, #99-REM_selection 4 . CCT ACT,CCCT,TCT,C,GCT . . AC=1725,149,135,6,4;AF=0.00766615558963,0.000662178077017,0.000599960002666,2.66648890074e-05,1.77765926716e-05;SB=129.198141555 GT:AC:AF:SB:NC 0:1725,149,135,6,4:0.00766615558963,0.000662178077017,0.000599960002666,2.66648890074e-05,1.77765926716e-05:129.198141555:+A=1725,+d2=6,+G=4,+CC=149,+C=222995,+T=135,-C=1, #99-REM_selection 5 . CTC TTC,ATC,CC,CTTC,C . . AC=162,50,41,3,3;AF=0.00072027850769,0.000222308181386,0.000182292708736,1.33384908831e-05,1.33384908831e-05;SB=1378.24539435 GT:AC:AF:SB:NC 0:162,50,41,3,3:0.00072027850769,0.000222308181386,0.000182292708736,1.33384908831e-05,1.33384908831e-05:1378.24539435:+A=50,+d1=41,+d2=3,+C=224653,+T=162,+CT=3,-C=1, #99-REM_selection 6 . TCCC TCC,CCCC,GCCC,ACCC,TC,NCCC,TCCCC,T . . AC=209,135,64,43,20,18,16,6;AF=0.00092855460923,0.000599784077732,0.000284342081295,0.00019104233587,8.88569004047e-05,7.99712103643e-05,7.10855203238e-05,2.66570701214e-05;SB=1069.38094795 GT:AC:AF:SB:NC 0:209,135,64,43,20,18,16,6:0.00092855460923,0.000599784077732,0.000284342081295,0.00019104233587,8.88569004047e-05,7.99712103643e-05,7.10855203238e-05,2.66570701214e-05:1069.38094795:+A=43,+d1=209,+d2=20,+d3=6,+G=64,+C=135,+N=18,+T=224569,+TC=16,-T=1, #99-REM_selection 7 . CCCCT TCCCT,ACCCT,GCCCT,C . . AC=47,36,13,2;AF=0.000209214333408,0.000160249276653,5.78677943468e-05,8.90273759181e-06;SB=4678.16666231 GT:AC:AF:SB:NC 0:47,36,13,2:0.000209214333408,0.000160249276653,5.78677943468e-05,8.90273759181e-06:4678.16666231:+A=36,+C=224551,+d4=2,+T=47,+G=13,-C=1, #99-REM_selection 8 . CCCTT TCCTT,ACCTT,GCCTT,NCCTT,CT,C . . AC=44,36,9,4,4,3;AF=0.000195671218987,0.000160094633717,4.00236584292e-05,1.77882926352e-05,1.77882926352e-05,1.33412194764e-05;SB=4994.82221787 GT:AC:AF:SB:NC 0:44,36,9,4,4,3:0.000195671218987,0.000160094633717,4.00236584292e-05,1.77882926352e-05,1.77882926352e-05,1.33412194764e-05:4994.82221787:+A=36,+C=224766,+d3=4,+d4=3,+G=9,+N=4,+T=44,-C=1,
And here is what I get after running the tool:
1 2 3 4 5 6 7 8 9 10 CHROM POS ID REF ALT QUAL FILTER AC AF SB
Any idea why it doesn't populate variants into the table?
Remove the first line (unless that is the Galaxy "view" column descriptions, and is not actually in your original file):
Chrom Pos ID Ref Alt Qual Filter Info Format data
Then remove the
# leading characters from the data lines, it is causing the tool to skip over these (Galaxy interprets those as comment lines). The
## comment notation should only be included on header lines and you have those formatted Ok.
After reformatting, the older and new version of the tool will work. Please see this test history for an example:
- Common datatypes explained https://galaxyproject.org/learn/datatypes/
Thanks! Jen, Galaxy team
Hi Jen, Thanks for identifying the problem. It was stupid to choose "#-99-REM_selection" as the name of my reference sequence! It works well after I changed the name. Thanks again. Amir
Great, glad that worked out!