Question: htseq-count no read align
0
gravatar for nicolas.loncle
12 months ago by
nicolas.loncle10 wrote:

Hi

I am trying to get raw count of mapped read, so I am running htseq-count on tophat accepted hits and my GFF file is Drosophila_melanogaster.BDGP5.77.gtf as I use Dm3 as reference genome for the Tophat. Tophat accepeted hits is 10 000 000 reads ish and the results from Htseq-count is 1 2 __no_feature 8075680 __ambiguous 0 __too_low_aQual 0 __not_aligned 0 __alignment_not_unique 20067133

and no read align for any gene. I kept default setting for htseq-count and check that 3rd column of the GFF was having the exon information and then the gene id was in the good column too

does that mean that I am not using the good GFF file? I use this one because it was named in previous post about which GFF to use with dm3. Or is there something else?

Thank you

htseq-count • 571 views
ADD COMMENTlink modified 12 months ago by Jennifer Hillman Jackson23k • written 12 months ago by nicolas.loncle10
0
gravatar for Jennifer Hillman Jackson
12 months ago by
United States
Jennifer Hillman Jackson23k wrote:

Hello,

There is probably a genome mismatch problem between dm3 (source UCSC) and the reference GFF. Perhaps try the iGenomes version instead? http://support.illumina.com/sequencing/sequencing_software/igenome.html

A genome mismatch can be a simple as non-matching chromosome identifiers (different sources use different identifiers) or the GFF could be from the wrong build. In your case, the GFF used has Ensembl identifiers while the dm3 reference genome has UCSC identifiers. The iGenomes web page has a GTF for dm3 with the matching identifiers.

To load an iGenomes GTF reference annotation file: Download the tar.gz from the external site locally, extract the contents, then upload the genes.gtf file to Galaxy for use.

None of this will solve Tophat alignment issues, should they exist, but it will allow htseq_count to function as expected.

Thanks, Jen,

ADD COMMENTlink written 12 months ago by Jennifer Hillman Jackson23k
1

Great thank you very much for your quick answer. I found one gene.GTF in

archive-2011-01-27-18-13-06

which seems to give me the best result, meaning less reads in no-features. Thank you again. Best wishes Nicolas cheers

ADD REPLYlink written 12 months ago by nicolas.loncle10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 80 users visited in the last hour