I'm examining my tophat output data and I would like to have some opinions on my align_summary.txt.
For example for one alignement I have this summary for a RNA-seq alignment. The quality score is good, and I trimmed to have fixed read lenth at 50n.
Mapped: 174579844 (95.1% of input)
of these: 36018754 (20.6%) have multiple alignments (48483 have >20)
Mapped: 174631890 (95.1% of input)
of these: 40873773 (23.4%) have multiple alignments (75889 have >20)
95.1% overall read alignment rate.
Aligned pairs: 168870512
of these: 24343928 (14.4%) have multiple alignments
and: 14517921 ( 8.6%) are discordant alignments
84.1% concordant pair alignment rate.
I'm happy with a 95% alignment of input, but I'm not sure where the multiple alignments come from nor whether the rate is acceptable.
The raw data comes from a RNA-seq so they should not be aligned to repeted sequences. I suppose the multiple alignments come from mismatches.
I can always discard this reads but I'd like to know if it's ok to keep them.
All comments are welcome. Thanks.