Question: Alignment rate differes using hg19 and hg38
gravatar for
2.2 years ago by
fate.gh10 wrote:

I have some RNA-seq fastq files.I'm using HiSat to align the fastq files. To do so, I use the built-in genome reference in Galaxy. When I align them to hg19, the alignment rate is much higher than when I align them to hg38. What causes such big difference?

Here this is an example:

* Alignment to hg19:

format    bam
database  hg19

37328991 reads; of these:
37328991 (100.00%) were unpaired; of these:
3767939 (10.09%) aligned 0 times
16947652 (45.40%) aligned exactly 1 time
16613400 (44.51%) aligned >1 times
89.91% overall alignment rate
[bam_sort_core] merging from 24

* Alignment to hg38:

format    bam
database   hg38

37328991 reads; of these:
37328991 (100.00%) were unpaired; of these:
11718927 (31.39%) aligned 0 times
15954103 (42.74%) aligned exactly 1 time
9655961 (25.87%) aligned >1 times
68.61% overall alignment rate
[bam_sort_core] merging from 24

What should I do?

Will it cause a problem if I want to obtain read counts for DE analysis using htseq?

hisat hg19 hg39 alignment rate
gravatar for Jennifer Hillman Jackson
2.2 years ago by
United States
Jennifer Hillman Jackson25k wrote:


These are the same exact inputs and parameters?

Seems odd, but possible, it depends on the read content. Hg38 is much more polished than hg19.

Perhaps run htseq_count on both and compare? It might provide a clue about where those extra unaligned hg38 reads were previously mapped to hg19. And what those sequences contain - you could isolate them and run FastQC to do some QA on them.

Jen, Galaxy team

Yes, These are the same exact inputs and parameters... I ran htseq-count and compared the results, but since the gtf files are different (different ensembl versions to match hg19 and hg38), the result are not the same.

