14 months ago by
I am trying to get raw count of mapped read, so I am running htseq-count on tophat accepted hits and my GFF file is Drosophila_melanogaster.BDGP5.77.gtf as I use Dm3 as reference genome for the Tophat. Tophat accepeted hits is 10 000 000 reads ish and the results from Htseq-count is 1 2 __no_feature 8075680 __ambiguous 0 __too_low_aQual 0 __not_aligned 0 __alignment_not_unique 20067133

and no read align for any gene. I kept default setting for htseq-count and check that 3rd column of the GFF was having the exon information and then the gene id was in the good column too

does that mean that I am not using the good GFF file? I use this one because it was named in previous post about which GFF to use with dm3. Or is there something else?

Thank you

14 months ago by
United States
There is probably a genome mismatch problem between dm3 (source UCSC) and the reference GFF. Perhaps try the iGenomes version instead?

A genome mismatch can be a simple as non-matching chromosome identifiers (different sources use different identifiers) or the GFF could be from the wrong build. In your case, the GFF used has Ensembl identifiers while the dm3 reference genome has UCSC identifiers. The iGenomes web page has a GTF for dm3 with the matching identifiers.

To load an iGenomes GTF reference annotation file: Download the tar.gz from the external site locally, extract the contents, then upload the genes.gtf file to Galaxy for use.

None of this will solve Tophat alignment issues, should they exist, but it will allow htseq_count to function as expected.

Thanks, Jen,

Great thank you very much for your quick answer. I found one gene.GTF in


which seems to give me the best result, meaning less reads in no-features. Thank you again. Best wishes Nicolas cheers

