We used genome editing nucleases to alter a specific site. Use of genome editing nucleases led to indel formation in the mouse genome. Further, I performed PCR at the site of interest and performed illumina deep sequencing. My aim is to calculate frequency and indels in the bam files generated after illumina deep sequencing of PCR products.
For identifying different indels, I performed indel calling using mpileup on usegalaxy.org. As are result, a vcf file was generated containing the list of SNPs and indels. I have following two questions:
1. How to calculate frequency and percentage of indels?: To get the frequency of indels, I used the value of 'IMF' provided under info column in vcf files. In vcf files, IMF has been defined as, "Maximum fraction of reads supporting an indel". I multiplied IMF by 100 to get the Maximum percentage of reads supporting an indels. Is my assumption correct that IMF indicates fraction of indels in my PCR products used for deep sequencing? So, multiplication of IMF by 100 should provide percentage of indels in my PCR products.
2. How to calculate total percentage of indels in the PCR products?: Moreover, my PCR products have several different types of indels at the same specific site, due to the use of genome editing nucleases. So, if I sum up the IMF values of all the indels and then multiply the sum by 100, do I get the total percentage of indels in my PCR product? Am I correct in above assumption?
Thanks for help.