Question: htseq-count results in all reads with no feature - chromosome naming mismatch issue between genome and annotation
1
gravatar for sf533
8 months ago by
sf53320
sf53320 wrote:

Hi,

I am sorry that this similar question has been asked before but I have read through endless posts and can't find solution.

I have mapped my reads using Tophat and have 97% reads mapped. But when I then try to run htseq_count all my reads are coming up as no feature. Please see attached pictures.

(admin fix for links)

I am aware that this is probably a problem with the gtf file and bam file columns not matching but I have tried every gtf file for hg38 I can find and keep getting same issue. I also tried STAR instead of Tophat and got same issue.

Please provide advice as this is urgent.

Thank you,

Sarah

ADD COMMENTlink modified 8 months ago by Jennifer Hillman Jackson25k • written 8 months ago by sf53320
0
gravatar for Jennifer Hillman Jackson
8 months ago by
United States
Jennifer Hillman Jackson25k wrote:

Hi Sarah,

I can't see your graphics - these need to be hosted by some public site then linked into questions here. I use https://imgur.com/ but any public link will work.

That said, I agree the problem is likely with the GTF (one of the most common reasons for failures). If you mapped against the hg38 genome indexed at Galaxy Main https://usegalaxy.org -- or UCSC was your genome source when indexing on your own server, try using the UCSC version of the annotation from iGenomes: https://support.illumina.com/sequencing/sequencing_software/igenome.html. Download the tar archive, unpack it locally, then upload just the genes.gtf to Galaxy (use FTP).

Give that a try first. If the job still fails:

  • Double check you actually mapped against hg38 with Tophat ("rerun" or "job details" will have the original inputs recorded)
  • Make sure there are hits in the resulting BAM datasets
  • Try using HISAT2 instead since both Tophat and Tophat2 are considered deprecated
  • Review your steps versus those in this tutorial (uses FeatureCounts, but Htseq-count can be used instead): https://galaxyproject.org/tutorials/nt_rnaseq/

More resources:

If you cannot find the problem after doing the above, and you can reproduce this at Galaxy Main, a bug report can be sent in from one of the error datasets. Please leave the inputs/outputs undeleted and include the link to this Galaxy Biostars post in the comments. How-to: https://galaxyproject.org/issues/

Thanks! Jen, Galaxy team

ADD COMMENTlink modified 8 months ago • written 8 months ago by Jennifer Hillman Jackson25k

Ok, I can see your graphics now. I am reviewing but it might not be enough - a bug report for this type of issue is usually better/faster.

ADD REPLYlink written 8 months ago by Jennifer Hillman Jackson25k

Well, that was fast anyway! Ok, the chromosome identifiers in your GTF are not formatted in the same way as UCSC's hg38 are formatted. Try using the iGenomes GTF - it will be a match.

https://galaxyproject.org/support/chrom-identifiers/

ADD REPLYlink written 8 months ago by Jennifer Hillman Jackson25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 169 users visited in the last hour