Question: How to get intronic+exonic reads using Htseq withing galaxy
7 days ago
mirjam.podgorica wrote:


Im usnig Htseq for counting and the feature type is set to exonic by default so I am assuming that Im getting only exonic reads. How to set the feature type so i get exonic and intronic reads?

I have tried to put on the featue : "-type" but i got higher number of no feature counts

So with default parameter set to exon this is what i get _no_feature 5821188 __ambiguous 419467 __too_low_aQual 0 __not_aligned 0 __alignment_not_unique 14069035

when i write : -type on the parameter i get this no_feature 11616316 __ambiguous 0 __too_low_aQual 0 __not_aligned 0 __alignment_not_unique 14069035

the number of no feature has increased, im not sure how to interpret that

I would like to be sure while using htseq that im couning for both exon and intron

Best, Mirjam

rna-seq galaxy
written 7 days ago by mirjam.podgorica
1 day ago
United States
Jennifer Hillman Jackson wrote:


How HTseq-count works:

  • The parameter "type" is a term used to filter which features present in your GTF reference annotation to assign counts to.
  • That "type" term must be an exact match for the feature term in the 3rd column of your GTF reference annotation.
  • Only one term can be used in any particular job (not multiple).
  • The term used must be all "one word", eg: no spaces. Avoid using underscores, dashes, periods, numbers.

To count both exons and introns together in a single job, you'll need to adjust your data to fit those rules. Something like this:

  • Both features need to be present in your GTF dataset, distinct lines or combined into a larger feature. Features such as "gene" and "transcript" are commonly used to represent entire gene/transcript bounds. Note that "gene" often includes the promoter region and 3'/5' UTR, so if you can find GTF annotation with features that are labeled with "transcript" that would be a better choice.
  • Change the 3rd column to be some term that can represent both exons and introns (if represented in distinct annotation lines).
  • Enter that exact same term as the "type" on the HTseq-count form.
  • Be aware that downstream tools (DeSeq2, etc) will use the 9th field to summarize counts, so each annotation line needs to contain the "gene_id" and "transcript_id" fields.

Thanks! Jen, Galaxy team

written 1 day ago by Jennifer Hillman Jackson
