Hi. I am new to rna-seq and I have a couple of quick questions. My input was paired-end non-stranded fastq files. Below is an example of the summary stats from one of my samples. Are these results acceptable/ within normal range expected?
I was also planning on filtering the bam files based on mapq scores to get rid of low quality reads. I read that trimming before alignment can affect alignment and introduce bias into the downstream estimation of read counts. So I did not trim before running HISAT2. Is this an appropriate thing to do?
Should I also filter out unpaired reads and reads not uniquely mapped? It was my understanding that most unpaired reads were produced by quality score trimming, but I did not do any trimming. Is this an expected amount of unpaired reads? Also, I understand how to filter how unpaired reads using SAMtools but I don't know how to filter out multi-mapped reads or even if I should. I can't seem to find anywhere what the pros and cons are of keeping or getting rid of unpaired reads and/or multi-mapped reads. Thanks!!
HISAT2 summary stats:
Total pairs: 32249562
Aligned concordantly or discordantly 0 time: 3651029 (11.32%)
Aligned concordantly 1 time: 26084216 (80.88%)
Aligned concordantly >1 times: 1290968 (4.00%)
Aligned discordantly 1 time: 1223349 (3.79%)
Total unpaired reads: 7302058
Aligned 0 time: 3915231 (53.62%)
Aligned 1 time: 2996094 (41.03%)
Aligned >1 times: 390733 (5.35%)
Overall alignment rate: 93.93%