Question: htseq-count results in all reads with no feature - chromosome naming mismatch issue between genome and annotation
1
gravatar for sf533
5 weeks ago by
sf53320
sf53320 wrote:

Hi,

I am sorry that this similar question has been asked before but I have read through endless posts and can't find solution.

I have mapped my reads using Tophat and have 97% reads mapped. But when I then try to run htseq_count all my reads are coming up as no feature. Please see attached pictures.

(admin fix for links)

I am aware that this is probably a problem with the gtf file and bam file columns not matching but I have tried every gtf file for hg38 I can find and keep getting same issue. I also tried STAR instead of Tophat and got same issue.

Please provide advice as this is urgent.

Thank you,

Sarah

ADD COMMENTlink modified 5 weeks ago by Jennifer Hillman Jackson24k • written 5 weeks ago by sf53320
0
gravatar for Jennifer Hillman Jackson
5 weeks ago by
United States
Jennifer Hillman Jackson24k wrote:

Hi Sarah,

I can't see your graphics - these need to be hosted by some public site then linked into questions here. I use https://imgur.com/ but any public link will work.

That said, I agree the problem is likely with the GTF (one of the most common reasons for failures). If you mapped against the hg38 genome indexed at Galaxy Main https://usegalaxy.org -- or UCSC was your genome source when indexing on your own server, try using the UCSC version of the annotation from iGenomes: https://support.illumina.com/sequencing/sequencing_software/igenome.html. Download the tar archive, unpack it locally, then upload just the genes.gtf to Galaxy (use FTP).

Give that a try first. If the job still fails:

  • Double check you actually mapped against hg38 with Tophat ("rerun" or "job details" will have the original inputs recorded)
  • Make sure there are hits in the resulting BAM datasets
  • Try using HISAT2 instead since both Tophat and Tophat2 are considered deprecated
  • Review your steps versus those in this tutorial (uses FeatureCounts, but Htseq-count can be used instead): https://galaxyproject.org/tutorials/nt_rnaseq/

More resources:

If you cannot find the problem after doing the above, and you can reproduce this at Galaxy Main, a bug report can be sent in from one of the error datasets. Please leave the inputs/outputs undeleted and include the link to this Galaxy Biostars post in the comments. How-to: https://galaxyproject.org/issues/

Thanks! Jen, Galaxy team

ADD COMMENTlink modified 5 weeks ago • written 5 weeks ago by Jennifer Hillman Jackson24k

Ok, I can see your graphics now. I am reviewing but it might not be enough - a bug report for this type of issue is usually better/faster.

ADD REPLYlink written 5 weeks ago by Jennifer Hillman Jackson24k

Well, that was fast anyway! Ok, the chromosome identifiers in your GTF are not formatted in the same way as UCSC's hg38 are formatted. Try using the iGenomes GTF - it will be a match.

https://galaxyproject.org/support/chrom-identifiers/

ADD REPLYlink written 5 weeks ago by Jennifer Hillman Jackson24k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 104 users visited in the last hour