How to get intronic+exonic reads using Htseq withing galaxy

Question: How to get intronic+exonic reads using Htseq withing galaxy

7 days ago by

Hi,

Im usnig Htseq for counting and the feature type is set to exonic by default so I am assuming that Im getting only exonic reads. How to set the feature type so i get exonic and intronic reads?

I have tried to put on the featue : "-type" but i got higher number of no feature counts

So with default parameter set to exon this is what i get _no_feature 5821188 __ambiguous 419467 __too_low_aQual 0 __not_aligned 0 __alignment_not_unique 14069035

when i write : -type on the parameter i get this no_feature 11616316 __ambiguous 0 __too_low_aQual 0 __not_aligned 0 __alignment_not_unique 14069035

the number of no feature has increased, im not sure how to interpret that

I would like to be sure while using htseq that im couning for both exon and intron

Best, Mirjam

rna-seq galaxy • 31 views

ADD COMMENT • link •

modified 1 day ago by Jennifer Hillman Jackson ♦ 25k • written 7 days ago by mirjam.podgorica • 0

1 day ago by

Jennifer Hillman Jackson ♦ 25k

United States

Jennifer Hillman Jackson ♦ 25k wrote:

Hello,

How HTseq-count works:

The parameter "type" is a term used to filter which features present in your GTF reference annotation to assign counts to.
That "type" term must be an exact match for the feature term in the 3rd column of your GTF reference annotation.
Only one term can be used in any particular job (not multiple).
The term used must be all "one word", eg: no spaces. Avoid using underscores, dashes, periods, numbers.

To count both exons and introns together in a single job, you'll need to adjust your data to fit those rules. Something like this:

Both features need to be present in your GTF dataset, distinct lines or combined into a larger feature. Features such as "gene" and "transcript" are commonly used to represent entire gene/transcript bounds. Note that "gene" often includes the promoter region and 3'/5' UTR, so if you can find GTF annotation with features that are labeled with "transcript" that would be a better choice.
Change the 3rd column to be some term that can represent both exons and introns (if represented in distinct annotation lines).
Enter that exact same term as the "type" on the HTseq-count form.
Be aware that downstream tools (DeSeq2, etc) will use the 9th field to summarize counts, so each annotation line needs to contain the "gene_id" and "transcript_id" fields.

Thanks! Jen, Galaxy team

ADD COMMENT • link written 1 day ago by Jennifer Hillman Jackson ♦ 25k

Similar posts • Search »