Question: variable, low alignment rates with HISAT2
1
gravatar for stephanie.major
4 days ago by
stephanie.major10 wrote:

Hello,

I have some single end mouse RNA sequencing data that I would like to use galaxy to analyze. I originally attempted using TopHat for alignment and was getting pretty low alignment rates (30-40%) for all of my files. After reading on this forum and other places, I saw that HISAT2 should be used. I thought that maybe this would improve my percentages since it is a more sensitive tool. However, I have run my files through and my some results improved, some are about the same, and some got worse. My overall alignment percentages range from as low as 18% to the highest being around 80%, which still seems fairly low compared to what I expected/what I have seen in others' projects. I am using the indexed reference genome (mm10). My files are fastq illumina 1.8+ scaled. Some additional notes: The alignment rates are not much different with trimming, some are even less than they were without it. I have used default parameters and read the hisat manual and changed a few parameters that I thought may help improve the alignment percentage but they did not. I attempted to BLAST some of the unaligned sequences from the samples that produced low alignment percentages and the sequences that I did BLAST either returned with no significant results or were mouse rRNA sequences, which leads me to believe that the polyA enrichment did not work as well as it could/should have. The fastq files are concatenated datasets since each sample was run in 4 lanes. I also tried aligning one sample's four files separately with HISAT2, instead of as a concatenated set, but each file's result had very low alignment percentages as well (about 4%) as opposed to only 1 having poor alignment.

I just wanted to ask if anyone had any suggestions on how to go about working with this - my sequencing core suggested demultiplexing the files and generating fastq manually with bcl2fastq (the ones I am currently using were generated on basespace), so I am working on that, but I wanted to get some other suggestions in case that is not successful.

Thank you in advance!!

rna-seq alignment hisat2 • 38 views
ADD COMMENTlink modified 15 hours ago by ellascottgm0 • written 4 days ago by stephanie.major10
0
gravatar for Jennifer Hillman Jackson
2 days ago by
United States
Jennifer Hillman Jackson23k wrote:

Hello,

The very low mapping rates do sound as if they are related to the fastq content.

Have you run FastQC? This Galaxy tutorial explains how to run and interpret the results. https://galaxyproject.org/learn/ > NGS logistics. Any problems could be then be reported back to those doing the library prep/sequencing to troubleshoot or optimize methods.

Thanks! Jen, Galaxy team

ADD COMMENTlink written 2 days ago by Jennifer Hillman Jackson23k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 102 users visited in the last hour