Question: Reads are 0 in htseq-count
gravatar for mmfiruleva
8 months ago by
mmfiruleva0 wrote:


I have a trouble using htseq-count, I used TopHat to aling my data, but when I tried to count the number of reads all the genes present 0 read.

I checked the chromosome names in ./gtf file: cut -f 1 -d " " ~/HG19/genes.gtf | cut -f 1 | sort | uniq So all of them are started with "chr". The same result was accepted by bowtie2-inspect --names ~/HG19/genome. (in .gtf file, I've seen the numbers like "chrHSCHR9_2_CTG3" and don't know what it is).

1) programs/tophat2 -p 4 -r 50 -G ~/HG19/genes.gtf -o tst_output HG19/genome tst_fastq Result: accepted_hits.bam deletions.bed junctions.bed align_summary.txt insertions.bed logs unmapped.bam

samtools view -H ./accepted_hits.bam:

@HD VN:1.0 SO:coordinate @SQ SN:chr1 LN:249250621 @SQ SN:chr10 LN:135534747 @SQ SN:chr11 LN:135006516 @SQ SN:chr12 LN:133851895 @SQ SN:chr13 LN:115169878 @SQ SN:chr14 LN:107349540 @SQ SN:chr15 LN:102531392 @SQ SN:chr16 LN:90354753 @SQ SN:chr17 LN:81195210 @SQ SN:chr18 LN:78077248 @SQ SN:chr19 LN:59128983 @SQ SN:chr2 LN:243199373 @SQ SN:chr20 LN:63025520 @SQ SN:chr21 LN:48129895 @SQ SN:chr22 LN:51304566 @SQ SN:chr3 LN:198022430 @SQ SN:chr4 LN:191154276 @SQ SN:chr5 LN:180915260 @SQ SN:chr6 LN:171115067 @SQ SN:chr7 LN:159138663 @SQ SN:chr8 LN:146364022 @SQ SN:chr9 LN:141213431 @SQ SN:chrM LN:16571 @SQ SN:chrX LN:155270560 @SQ SN:chrY LN:59373566 @PG ID:TopHat VN:2.1.1 CL:programs/tophat -p 4 -r 50 -G ~/HG19/genes.gtf -o tst_output HG19/genome tst.fastq

less align_summary.txt:

Reads: Input : 99 Mapped : 76 (76.8% of input) of these: 1 ( 1.3%) have multiple alignments (0 have >20) 76.8% overall read mapping rate.

2) samtools view -h tst_output/accepted_hits.bam | htseq-count - ~/HG19/genes.gtf > tst_count.txt tail tst_count.txt:

ENST00000610276 0 ENST00000610277 0 ENST00000610278 0 ENST00000610279 0 ENST00000610280 0 __no_feature 40 __ambiguous 27 __too_low_aQual 0 __not_aligned 0 __alignment_not_unique 15

I have no idea what should I do, so could you please help me.

PS: after samtools sort -n -o temp-sorted.bam accepted_hits.bam I have the same result ("SO:queryname" in temp-sorted.bam file).

Thanks, Maria.

rna-seq tophat htseq-count • 274 views
ADD COMMENTlink modified 8 months ago by Jennifer Hillman Jackson25k • written 8 months ago by mmfiruleva0
gravatar for Jennifer Hillman Jackson
8 months ago by
United States
Jennifer Hillman Jackson25k wrote:


There could be a few things going on so this advice is for using the tools within Galaxy, but should also apply for line-command usage. Or, you can ask about line-command advice at a more general bioinformatics forum like if this help does not resolve your problems. This forum is focused on Galaxy usage and the community is focused on those topics.

Most common reasons for this type of counting problem:

  1. Check to make sure the target reference genome and reference annotation are from the exact same genome build. The chromosome identifiers must be an exact match. My guess is that this is the root of your problem.
  2. The reference annotation must contain the gene_id attribute to count by gene. This is the default tool setting. If counting up some other attribute in the GTF/GFF3, that would need to be specified (line-command or on the Galaxy tool form).
  3. Tophat/Tophat2 are considered deprecated. Using HISAT2 is a better choice.
  4. A queryname sorted BAM/SAM is required as line-command input. If using the Galaxy wrapped version of the tool, use the HISAT2 BAM output directly (it will be coordinate-sorted) and check the box on the HTseq-count tool form to sort the BAM/SAM by queryname during job execution.

Detailed help:

Thanks! Jen, Galaxy team

ADD COMMENTlink written 8 months ago by Jennifer Hillman Jackson25k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 169 users visited in the last hour