Question: htseq read counts zero
0
gravatar for fate.gh
2.2 years ago by
fate.gh10
fate.gh10 wrote:

Hi,

I have imported some fastq files (homo sapiens) from http://www.ebi.ac.uk/ena/data/ to Galaxy. In order to obtain read counts, I aligned them to hg19 using HiSat (default parameters). Then since my reference genome was hg19, I used GTF file (Version 19 (July 2013 freeze, GRCh37) - Ensembl 74, 75) from Gencode to obtain read counts using htseq.

The total number of counts obtained for features is "10347508" which seems to be ok. While I have lost a number of counts about

__no_feature 2362227 __ambiguous 788874 __too_low_aQual 1001993 __not_aligned 2517255 __alignment_not_unique 3866370

Do you think the result is reasonable?

Something confusing is that from total 57820 genes, the counts for each gene up to gene 18356 are mostly non-zero, but counts for each gene from gene 18356 to gene 57820 are mostly zero (a few of them are non-zero).

Why is that?

Do you think I have to change my GTF file? Which version?

Or do you think I have to consider only the first 18356 genes for DE analysis ?

Thanks

ADD COMMENTlink modified 2.2 years ago by Jennifer Hillman Jackson25k • written 2.2 years ago by fate.gh10
1
gravatar for Jennifer Hillman Jackson
2.2 years ago by
United States
Jennifer Hillman Jackson25k wrote:

Hello,

Check for a mismatch between the chromosome names in the inputs. This prior Q&A explains: https://biostar.usegalaxy.org/p/18171/

A reference GTF file for hg19 with chromosome identifiers that match the natively indexed hg19 can be obtained from UCSC or iGenomes. https://galaxyproject.org/support/chrom-identifiers/

Best, Jen, Galaxy team

ADD COMMENTlink modified 18 months ago • written 2.2 years ago by Jennifer Hillman Jackson25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 175 users visited in the last hour