Hello!!
I am running HT seq using my sam files (obtained from bowtie) and the gtf file I obtained from Ensemble for Phytophthora infestans. Everything is ok except that I am obtaining zero counts on my HTseq output. I have read in other post that it may be because my gtf file uses different identifiers than my sam output. But I really don't know how to fix it and it is somehow urgent! I am analyzing small RNAseq data in four different samples so I need the table counts in order to see if small RNAs are differently expressed.
Thanks!!!!
I will remain attentive!!!
Best, Juliana
Post the first 5-10 lines of the SAM file and the first 5-10 lines of the GTF file.
Thanks!
First seven of the SAM file:
@HD VN:1.0 SO:unsorted
@SQ SN:NW_003302556.1 LN:4850
@SQ SN:NW_003302557.1 LN:5561
@SQ SN:NW_003302558.1 LN:4798
@SQ SN:NW_003302559.1 LN:40370
@SQ SN:NW_003302560.1 LN:4805
@SQ SN:NW_003302561.1 LN:5155
First seven of the GTF file:
supercont1.1 broads exon 10097 10114 . - . transcript_id "transcript:PITG_00002T0"; gene_id "gene:PITG_00002"; supercont1.1 broads exon 10171 10433 . - . transcript_id "transcript:PITG_00002T0"; gene_id "gene:PITG_00002"; supercont1.1 broads exon 10474 10522 . - . transcript_id "transcript:PITG_00002T0"; gene_id "gene:PITG_00002"; supercont1.1 broads CDS 10100 10114 . - 0 transcript_id "transcript:PITG_00002T0"; gene_id "gene:PITG_00002"; supercont1.1 broads CDS 10171 10433 . - 2 transcript_id "transcript:PITG_00002T0"; gene_id "gene:PITG_00002"; supercont1.1 broads CDS 10474 10522 . - 0 transcript_id "transcript:PITG_00002T0"; gene_id "gene:PITG_00002"; supercont1.1 broads exon 38775 39071 . + . transcript_id "transcript:PITG_00003T0"; gene_id "gene:PITG_00003";
Devon's thought was right I suppose. Your BAM/SAM file was aligned to sequences named 'NW_003302556.1' etc while your GTF file contains sequences named 'supercont1.1'. These do not correspond causing HTSeq to not find reads aligned to 'supercont1.1'.
Thank you very much! That was indeed the problem.