Question: Low alignment for paired end reads using HISAT2
10 months ago by
King's College London
emilyread6


I am new to processing and analysis of RNAseq data. I have recently completed a paired-end alignment using mm10 as a reference genome. I selected to "Specify strand-specific information as FR Unstranded" and "Disable spliced alignment as no-spliced-alignment". Most of the other settings I left as default. The following alignment was relatively poor:

79081475 reads; of these: 79081475 (100.00%) were paired; of these: 47291164 (59.80%) aligned concordantly 0 times 26796680 (33.88%) aligned concordantly exactly 1 time 4993631 (6.31%) aligned concordantly >1 times ---- 47291164 pairs aligned concordantly 0 times; of these: 1181324 (2.50%) aligned discordantly 1 time ---- 46109840 pairs aligned 0 times concordantly or discordantly; of these: 92219680 mates make up the pairs; of these: 55187078 (59.84%) aligned 0 times 30703132 (33.29%) aligned exactly 1 time 6329470 (6.86%) aligned >1 times 65.11% overall alignment rate

I tried trimming prior to alignment as the FastQC of the reverse showed poor per base sequence content however the alignment results were similar.

I would be very grateful for any help on how to improve the alignment. Thank you very much for your time.



I forgot to mention I used HISAT2 for the alignment

10 months ago by
United States
Jennifer Hillman Jackson


If the data is RNA, is there a reason why spliced alignments not considered when aligning to mm10? Maybe try with spliced alignments retained and compare?

Low mapping rates can be due to a few different reasons: sequence QA is needed, low sequence quality overall, a mismatched fastq datatype assignment (related to quality score scaling), mismatched data source versus the reference genome used for alignment, data mixups (paired-end data not really from the same pair, or possibly swapped on the tool input form), incorrect mapping tool choice/options for the data, and other reasons.

The most common usage issues and workflow examples are covered in the Galaxy hub:

If after reviewing you cannot determine the problem, and are working at or can reproduce the mapping run there, a shared history link can be sent to Please leave inputs and outputs undeleted (including any produced to test the data) and include a link to this post in the comments. Also note the dataset numbers with the problematic mapping rates.

Thanks, Jen, Galaxy team

