Question: How to filter SNP and Indel data
I have mapped my sequence with a wild type sequence using Bowtie2 and obtained around 8000 SNPs and 700 Indels using Varscan. Please let me know how do I further shortlist the SNPs and Indels (based on which criteria)?

The output file looks like this

1   CP009494.1  .   A   C   .   PASS    ADP=16;WT=0;HET=1;HOM=0;NC=0    GT:GQ:SDP:DP:RD:AD:FREQ:PVAL:RBQ:ABQ:RDF:RDR:ADF:ADR    0/1:0:16:16:2:11:68.75%:9.8E-1:32:32:2:0:11:0
2   CP009494.1  .   A   G   .   PASS    ADP=20;WT=0;HET=1;HOM=0;NC=0    GT:GQ:SDP:DP:RD:AD:FREQ:PVAL:RBQ:ABQ:RDF:RDR:ADF:ADR    0/1:0:21:20:6:7:35%:9.8E-1:30:32:6:0:7:0
3   CP009494.1  .   A   G   .   PASS    ADP=21;WT=0;HET=1;HOM=0;NC=0    GT:GQ:SDP:DP:RD:AD:FREQ:PVAL:RBQ:ABQ:RDF:RDR:ADF:ADR    0/1:0:21:21:10:8:36.36%:9.8E-1:30:32:10:0:8:0

Now based on which criteria shall I shortlist the SNPs to a smaller number?? based on reads?

Bowtie outputs a list of potential variants ( a mixture of sites with low quality evidence for a true variant and those with high quality evidence). You now need to use a program to call variants freebayes and mpileup are common and avalible on They can asses information from multiple individauls simultaniously to call the probability that you have a 'true' varient. The imput file is a BAM file or multiple BAM files. This is a nice trutorial which goes through mpileup and concepts tutorial The trutorial also mentions GATK but would sugest you ignore this as it is a lot of hassle in Galaxy. Personally I use freebayes. Cheers


