I am new to analysing raw sequencing data - apologies in advance if I seem not to understand what I'm dealing with.
I have 6 human genome datasets (2 biological replicates, 3 time-points, 5x coverage each) and need a "simple" information about mutational frequency in each sample (#variants/1Mb). I understand most tools need a min coverage/base so won't be able to analyse this with samtools mpileup for example.
The hypothesis I am testing is that timepoint2 will have a larger number of variants compared to timepoint1 (the H0 being that there is no difference) so the total number of variants will include all the technical errors during sequencing, PCR etc. plus biological differences. Basically I am assuming that all samples will have similar numbers of baseline variants, all I care about is the difference between the totals. I do not need information about a specific base mutation or gene.
I have generated bam files with marked duplicates (each file is 10-13Gb). Can anyone help or give suggestions on how to proceed? Many thanks in advance.