Question: Getting an excel list of variations from Varscan or FreeBayes .vcf files
0
gravatar for dreines
16 months ago by
dreines0
dreines0 wrote:

I have whole genome sequence from Saccharomyces cerevisiae strains and I'm looking for their variations from refseq. The reads have been worked up to .vcf files using Varscan and also FreeBayes on the useGalaxy web interface. They view nicely in IGB. I'm stuck at how one gets an excel list of the collection of variants with annotations for snps (or indels, etc)? Any videos, tutorials, comments appreciated. Thank you.

varscan snpt vcf excel freebayes • 892 views
ADD COMMENTlink modified 16 months ago by Jennifer Hillman Jackson25k • written 16 months ago by dreines0
0
gravatar for Jennifer Hillman Jackson
16 months ago by
United States
Jennifer Hillman Jackson25k wrote:

Hello,

Annotations can be associated with the tools Annovar, SnpEff, VCF-BEDintersect, and other tools such as Gemini.

VCF format can be transformed to tab-delimited format with the tool VCFtoTab-delimited then download to import into Excel.

Galaxy tutorials: https://galaxyproject.org/learn/

Thanks! Jen, Galaxy team

ADD COMMENTlink written 16 months ago by Jennifer Hillman Jackson25k
1

Hello,

VCFtoTab-delimited stopped working. Here are the results for converting a VCF file into tabular: Here is my vcf file:

Chrom   Pos ID  Ref Alt Qual    Filter  Info    Format  data
##fileformat=VCFv4.3
##fileDate=20180725
##source=Naive Variant Caller version 0.0.4
##reference=file:///galaxy-repl/main/files/026/335/dataset_26335075.dat
##INFO=<ID=AC,Number=A,Type=Integer,Description="Allele count in genotypes, for each ALT allele, in the same order as listed">
##INFO=<ID=AF,Number=A,Type=Float,Description="Allele Frequency, for each ALT allele, in the same order as listed">
##INFO=<ID=SB,Number=1,Type=Float,Description="Strand Bias">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=AC,Number=.,Type=Integer,Description="Allele count in genotypes, for each ALT allele, in the same order as listed">
##FORMAT=<ID=AF,Number=.,Type=Float,Description="Allele Frequency, for each ALT allele, in the same order as listed">
##FORMAT=<ID=SB,Number=1,Type=Float,Description="Strand Bias">
##FORMAT=<ID=NC,Number=.,Type=String,Description="Nucleotide and indel counts">
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  __NONE__
#99-REM_selection   1   .   C   T,G,A,N,CAACCTCCCCTTCTACGAGCACAGC   .   .   AC=206,118,41,15,1;AF=0.000918015838001,0.000525853732447,0.000182711890088,6.68458134467e-05,4.45638756311e-06;SB=1082.20289411    GT:AC:AF:SB:NC  0:206,118,41,15,1:0.000918015838001,0.000525853732447,0.000182711890088,6.68458134467e-05,4.45638756311e-06:1082.20289411:+A=41,+C=224015,+G=118,+N=15,+CAACCTCCCCTTCTACGAGCACAGC=1,+T=206,-C=1,
#99-REM_selection   2   .   A   C,G,T,N .   .   AC=374,31,16,16;AF=0.00166371586936,0.000137901582754,7.11750104538e-05,7.11750104538e-05;SB=598.295995562  GT:AC:AF:SB:NC  0:374,31,16,16:0.00166371586936,0.000137901582754,7.11750104538e-05,7.11750104538e-05:598.295995562:+A=224360,+C=374,+T=16,+G=31,+N=16,-A=1,
#99-REM_selection   3   .   A   C,G,T,AC    .   .   AC=105,22,17,17;AF=0.000466969678102,9.78412658881e-05,7.56046145499e-05,7.56046145499e-05;SB=2119.74527861 GT:AC:AF:SB:NC  0:105,22,17,17:0.000466969678102,9.78412658881e-05,7.56046145499e-05,7.56046145499e-05:2119.74527861:+A=224692,+C=105,+AC=17,+G=22,+T=17,-A=1,
#99-REM_selection   4   .   CCT ACT,CCCT,TCT,C,GCT  .   .   AC=1725,149,135,6,4;AF=0.00766615558963,0.000662178077017,0.000599960002666,2.66648890074e-05,1.77765926716e-05;SB=129.198141555    GT:AC:AF:SB:NC  0:1725,149,135,6,4:0.00766615558963,0.000662178077017,0.000599960002666,2.66648890074e-05,1.77765926716e-05:129.198141555:+A=1725,+d2=6,+G=4,+CC=149,+C=222995,+T=135,-C=1,
#99-REM_selection   5   .   CTC TTC,ATC,CC,CTTC,C   .   .   AC=162,50,41,3,3;AF=0.00072027850769,0.000222308181386,0.000182292708736,1.33384908831e-05,1.33384908831e-05;SB=1378.24539435   GT:AC:AF:SB:NC  0:162,50,41,3,3:0.00072027850769,0.000222308181386,0.000182292708736,1.33384908831e-05,1.33384908831e-05:1378.24539435:+A=50,+d1=41,+d2=3,+C=224653,+T=162,+CT=3,-C=1,
#99-REM_selection   6   .   TCCC    TCC,CCCC,GCCC,ACCC,TC,NCCC,TCCCC,T  .   .   AC=209,135,64,43,20,18,16,6;AF=0.00092855460923,0.000599784077732,0.000284342081295,0.00019104233587,8.88569004047e-05,7.99712103643e-05,7.10855203238e-05,2.66570701214e-05;SB=1069.38094795   GT:AC:AF:SB:NC  0:209,135,64,43,20,18,16,6:0.00092855460923,0.000599784077732,0.000284342081295,0.00019104233587,8.88569004047e-05,7.99712103643e-05,7.10855203238e-05,2.66570701214e-05:1069.38094795:+A=43,+d1=209,+d2=20,+d3=6,+G=64,+C=135,+N=18,+T=224569,+TC=16,-T=1,
#99-REM_selection   7   .   CCCCT   TCCCT,ACCCT,GCCCT,C .   .   AC=47,36,13,2;AF=0.000209214333408,0.000160249276653,5.78677943468e-05,8.90273759181e-06;SB=4678.16666231   GT:AC:AF:SB:NC  0:47,36,13,2:0.000209214333408,0.000160249276653,5.78677943468e-05,8.90273759181e-06:4678.16666231:+A=36,+C=224551,+d4=2,+T=47,+G=13,-C=1,
#99-REM_selection   8   .   CCCTT   TCCTT,ACCTT,GCCTT,NCCTT,CT,C    .   .   AC=44,36,9,4,4,3;AF=0.000195671218987,0.000160094633717,4.00236584292e-05,1.77882926352e-05,1.77882926352e-05,1.33412194764e-05;SB=4994.82221787    GT:AC:AF:SB:NC  0:44,36,9,4,4,3:0.000195671218987,0.000160094633717,4.00236584292e-05,1.77882926352e-05,1.77882926352e-05,1.33412194764e-05:4994.82221787:+A=36,+C=224766,+d3=4,+d4=3,+G=9,+N=4,+T=44,-C=1,

And here is what I get after running the tool:

1 2 3 4 5 6 7 8 9 10 CHROM POS ID REF ALT QUAL FILTER AC AF SB

Any idea why it doesn't populate variants into the table?

Thanks! Amir

ADD REPLYlink written 4 months ago by Amir.Taheri.Ghahfarokhi20

Hi Amir,

Remove the first line (unless that is the Galaxy "view" column descriptions, and is not actually in your original file):

Chrom   Pos ID  Ref Alt Qual    Filter  Info    Format  data

Then remove the # leading characters from the data lines, it is causing the tool to skip over these (Galaxy interprets those as comment lines). The # or ## comment notation should only be included on header lines and you have those formatted Ok.

After reformatting, the older and new version of the tool will work. Please see this test history for an example:

FAQs: https://galaxyproject.org/support/

Thanks! Jen, Galaxy team

ADD REPLYlink modified 4 months ago • written 4 months ago by Jennifer Hillman Jackson25k
1

Hi Jen, Thanks for identifying the problem. It was stupid to choose "#-99-REM_selection" as the name of my reference sequence! It works well after I changed the name. Thanks again. Amir

ADD REPLYlink written 4 months ago by Amir.Taheri.Ghahfarokhi20

Great, glad that worked out!

ADD REPLYlink written 4 months ago by Jennifer Hillman Jackson25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 172 users visited in the last hour