Question: HT-seq count does not recognize "gff" annotation dataset == assign GTF and consider an alternate annotation source
gravatar for drumarsohail
9 weeks ago by
drumarsohail10 wrote:

Hi all

I have been using galaxy tutorial for RNA-Seq of Drosophila malanogaster.

I have done bwa mem, tophat2, and filter and sort steps.

I have imported SFT file from UCSC and filtered it using c7 != "."

Now I want to run htseq-count by selecting tophat accepted_hits vs Filter on data file

But htseq-count doesnt give me Filter on data file in GFF file option.

See image below

gff annotation gtf galaxy htseq • 118 views
ADD COMMENTlink modified 9 weeks ago • written 9 weeks ago by drumarsohail10
gravatar for Jennifer Hillman Jackson
9 weeks ago by
United States
Jennifer Hillman Jackson25k wrote:


I renamed the subject to reflect the question better (to link it to prior Q&A).

If the annotation data source is from the tool Get Data > UCSC main with the output in the Table Browser set to gtf, then that annotation will technically work. Just be aware that counts will be summarized by transcript, not gene. Why? UCSC data extracted with this method returns annotation with both gene_id and transcript_id to the same value -- "transcript". This is how the UCSC Table Browser works.

The datatype gff would not be assigned when retrieved by this method/source when directly importing to Galaxy with the UCSC TB.

Perhaps the data was loaded using another method? You can either accept that as a usable summary for your counts and reassign the datatype gtf, if it matches that datatype specification/mapping database genome build/version, or choose another annotation source with gene_id summarized (strongly recommended).

Wherever you choose to obtain the annotation check it versus the genome used for mapping. You may need to assign the correct database to BAM inputs due to a small bug fixed last week.

The FAQs here explain in more details: Plus you can review prior Q&A at the right sidebar (or search for the term "htseq") to see how others have resolved annotation/database/datatype data conflicts within the context of their RNA-seq analysis.

Tophat is considered deprecated, with HISAT2 as the replacement. Please see the Galaxy RNA-seq tutorials for example workflows:

Thanks! Jen, Galaxy team

ADD COMMENTlink modified 9 weeks ago • written 9 weeks ago by Jennifer Hillman Jackson25k
gravatar for drumarsohail
9 weeks ago by
drumarsohail10 wrote:

Hi Jen

Thanks It helped. I didn't define UCSC output file formate as GTF . Therefore galaxy couldn't recognize it. It was really very nice written tutorial. Thanks again.

ADD COMMENTlink written 9 weeks ago by drumarsohail10
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 123 users visited in the last hour