Question: htseq-count results in all reads with no feature - chromosome naming mismatch issue between genome and annotation
I am sorry that this similar question has been asked before but I have read through endless posts and can't find solution.

I have mapped my reads using Tophat and have 97% reads mapped. But when I then try to run htseq_count all my reads are coming up as no feature. Please see attached pictures.

I am aware that this is probably a problem with the gtf file and bam file columns not matching but I have tried every gtf file for hg38 I can find and keep getting same issue. I also tried STAR instead of Tophat and got same issue.

Please provide advice as this is urgent.

Hi Sarah,

I can't see your graphics - these need to be hosted by some public site then linked into questions here. I use but any public link will work.

That said, I agree the problem is likely with the GTF (one of the most common reasons for failures). If you mapped against the hg38 genome indexed at Galaxy Main -- or UCSC was your genome source when indexing on your own server, try using the UCSC version of the annotation from iGenomes: Download the tar archive, unpack it locally, then upload just the genes.gtf to Galaxy (use FTP).

Give that a try first. If the job still fails:

  • Double check you actually mapped against hg38 with Tophat ("rerun" or "job details" will have the original inputs recorded)
  • Make sure there are hits in the resulting BAM datasets
  • Try using HISAT2 instead since both Tophat and Tophat2 are considered deprecated
  • Review your steps versus those in this tutorial (uses FeatureCounts, but Htseq-count can be used instead):

More resources:

If you cannot find the problem after doing the above, and you can reproduce this at Galaxy Main, a bug report can be sent in from one of the error datasets. Please leave the inputs/outputs undeleted and include the link to this Galaxy Biostars post in the comments. How-to:

Thanks! Jen, Galaxy team

Ok, I can see your graphics now. I am reviewing but it might not be enough - a bug report for this type of issue is usually better/faster.

Well, that was fast anyway! Ok, the chromosome identifiers in your GTF are not formatted in the same way as UCSC's hg38 are formatted. Try using the iGenomes GTF - it will be a match.

