Question: Low alignment for paired end reads using HISAT2
0
gravatar for emilyread6
13 months ago by
emilyread60
King's College London
emilyread60 wrote:

Hello,

I am new to processing and analysis of RNAseq data. I have recently completed a paired-end alignment using mm10 as a reference genome. I selected to "Specify strand-specific information as FR Unstranded" and "Disable spliced alignment as no-spliced-alignment". Most of the other settings I left as default. The following alignment was relatively poor:

79081475 reads; of these: 79081475 (100.00%) were paired; of these: 47291164 (59.80%) aligned concordantly 0 times 26796680 (33.88%) aligned concordantly exactly 1 time 4993631 (6.31%) aligned concordantly >1 times ---- 47291164 pairs aligned concordantly 0 times; of these: 1181324 (2.50%) aligned discordantly 1 time ---- 46109840 pairs aligned 0 times concordantly or discordantly; of these: 92219680 mates make up the pairs; of these: 55187078 (59.84%) aligned 0 times 30703132 (33.29%) aligned exactly 1 time 6329470 (6.86%) aligned >1 times 65.11% overall alignment rate

I tried trimming prior to alignment as the FastQC of the reverse showed poor per base sequence content however the alignment results were similar.

I would be very grateful for any help on how to improve the alignment. Thank you very much for your time.

Best,

Emily

rna-seq alignment • 1.4k views
ADD COMMENTlink written 13 months ago by emilyread60

I forgot to mention I used HISAT2 for the alignment

ADD REPLYlink written 13 months ago by emilyread60
0
gravatar for Jennifer Hillman Jackson
13 months ago by
United States
Jennifer Hillman Jackson25k wrote:

Hello,

If the data is RNA, is there a reason why spliced alignments not considered when aligning to mm10? Maybe try with spliced alignments retained and compare?

Low mapping rates can be due to a few different reasons: sequence QA is needed, low sequence quality overall, a mismatched fastq datatype assignment (related to quality score scaling), mismatched data source versus the reference genome used for alignment, data mixups (paired-end data not really from the same pair, or possibly swapped on the tool input form), incorrect mapping tool choice/options for the data, and other reasons.

The most common usage issues and workflow examples are covered in the Galaxy hub:

If after reviewing you cannot determine the problem, and are working at https://usegalaxy.org or can reproduce the mapping run there, a shared history link can be sent to galaxy-bugs@lists.galaxyproject.org. Please leave inputs and outputs undeleted (including any produced to test the data) and include a link to this post in the comments. Also note the dataset numbers with the problematic mapping rates.

Thanks, Jen, Galaxy team

ADD COMMENTlink modified 13 months ago • written 13 months ago by Jennifer Hillman Jackson25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 165 users visited in the last hour