Question: HiSAT2 Alignment Rate Dropping with Cufflinks export option enabled
0
gravatar for nick.andrews15
7 months ago by
nick.andrews150 wrote:

I am trying to take trimmed and qc filtered paired RNA seq data and run it through HiSat2 > Stringtie > Cuffdiff for DE analysis. I am getting some interesting behavior with the HiSat2 alignment percentages. When the Report alignments tailored specifically for Cufflinks option is checked in galaxy or the --dta-cufflinks for command line users, the mapped percentage fall substantially on the same dataset. Does anyone have any idea what could be causing this?

For example with cufflinks option: 36198885 reads; of these: 36198885 (100.00%) were paired; of these: 17307719 (47.81%) aligned concordantly 0 times 15647243 (43.23%) aligned concordantly exactly 1 time 3243923 (8.96%) aligned concordantly >1 times

Same Dataset without: 36198885 reads; of these: 36198885 (100.00%) were paired; of these: 3167919 (8.75%) aligned concordantly 0 times 27036679 (74.69%) aligned concordantly exactly 1 time

5994287 (16.56%) aligned concordantly >1 times

rna-seq cufflinks hisat2 • 373 views
ADD COMMENTlink modified 7 weeks ago by Widmer, Giovanni140 • written 7 months ago by nick.andrews150
0
gravatar for Mo Heydarian
7 months ago by
Mo Heydarian790
United States
Mo Heydarian790 wrote:

Hello,

This is an interesting problem. The HISAT2 manual mentions a reduced mapping rate with the -dta option, but not how much to expect. I recommend reporting this issue to the HISAT2 github repository where the authors of the tool can comment or provide insight on the dramatic reduced mapping percentage you see with -dta-cufflinks option. The HISAT2 github repo can be found here: https://github.com/infphilo/hisat2/issues

Thanks for reporting your observation and for using Galaxy!

Cheers, Mo Heydarian

ADD COMMENTlink written 7 months ago by Mo Heydarian790
0
gravatar for nick.andrews15
7 months ago by
nick.andrews150 wrote:

Mo, Interestingly switching back to HiSAT 1 and using the -dta option for cufflinks yielded a roughly 85% alignment rate. I don't know what could be causing this, but I will use HiSAT for the time being I guess.

ADD COMMENTlink written 7 months ago by nick.andrews150
0
gravatar for Widmer, Giovanni
7 weeks ago by
US, Tufts University
Widmer, Giovanni140 wrote:

Hi, similar observation here. I align 100-nt single-end RNA-Seq reads to the built-in Sus scofa genome. I have 7,000,000 reads per sample. I get in the vicinity of 50% aligned sequences. What puzzles me is that if I BLAST randomly selected unaligned reads into genbank they all perfectly align to pig sequences. At most, I see 1-2 mismatches or an indel which make me wonder if HiSat struggles with reads that span a splice site. I don't think the issue has to do with FASTQ format because the alignment rate is slightly better if I use FASTQ formatted sequences as input as compared to the same input file in FASTA. I ran the sequences trough Trimmomatic and close to 100% passed. When I analyze the same files with TopHat, the % aligned is higher, ~75%.

thanks,

Giovanni Widmer

ADD COMMENTlink written 7 weeks ago by Widmer, Giovanni140
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 130 users visited in the last hour