Low alignment for paired end reads using HISAT2

Question: Low alignment for paired end reads using HISAT2

13 months ago by

King's College London

emilyread6 • 0 wrote:

Hello,

I am new to processing and analysis of RNAseq data. I have recently completed a paired-end alignment using mm10 as a reference genome. I selected to "Specify strand-specific information as FR Unstranded" and "Disable spliced alignment as no-spliced-alignment". Most of the other settings I left as default. The following alignment was relatively poor:

79081475 reads; of these: 79081475 (100.00%) were paired; of these: 47291164 (59.80%) aligned concordantly 0 times 26796680 (33.88%) aligned concordantly exactly 1 time 4993631 (6.31%) aligned concordantly >1 times ---- 47291164 pairs aligned concordantly 0 times; of these: 1181324 (2.50%) aligned discordantly 1 time ---- 46109840 pairs aligned 0 times concordantly or discordantly; of these: 92219680 mates make up the pairs; of these: 55187078 (59.84%) aligned 0 times 30703132 (33.29%) aligned exactly 1 time 6329470 (6.86%) aligned >1 times 65.11% overall alignment rate

I tried trimming prior to alignment as the FastQC of the reverse showed poor per base sequence content however the alignment results were similar.

I would be very grateful for any help on how to improve the alignment. Thank you very much for your time.

Best,

Emily

rna-seq alignment • 1.4k views

ADD COMMENT • link •

written 13 months ago by emilyread6 • 0

I forgot to mention I used HISAT2 for the alignment

ADD REPLY • link written 13 months ago by emilyread6 • 0

13 months ago by

Jennifer Hillman Jackson ♦ 25k

United States

Jennifer Hillman Jackson ♦ 25k wrote:

Hello,

If the data is RNA, is there a reason why spliced alignments not considered when aligning to mm10? Maybe try with spliced alignments retained and compare?

Low mapping rates can be due to a few different reasons: sequence QA is needed, low sequence quality overall, a mismatched fastq datatype assignment (related to quality score scaling), mismatched data source versus the reference genome used for alignment, data mixups (paired-end data not really from the same pair, or possibly swapped on the tool input form), incorrect mapping tool choice/options for the data, and other reasons.

The most common usage issues and workflow examples are covered in the Galaxy hub:

https://galaxyproject.org/support/#troubleshooting Check the format/content of all inputs: fastq, custom genome (when used), reference annotation
https://galaxyproject.org/learn/ Start with NGS logistics, then review RNA-seq tutorials

If after reviewing you cannot determine the problem, and are working at https://usegalaxy.org or can reproduce the mapping run there, a shared history link can be sent to galaxy-bugs@lists.galaxyproject.org. Please leave inputs and outputs undeleted (including any produced to test the data) and include a link to this post in the comments. Also note the dataset numbers with the problematic mapping rates.

Thanks, Jen, Galaxy team

ADD COMMENT • link modified 13 months ago • written 13 months ago by Jennifer Hillman Jackson ♦ 25k

Similar posts • Search »