I am new to bioinformatics, and am looking at the expression of bacteria genes under different conditions. The genes I am interested in are actually bacteriophage genes (virus that inserts its genes into bacteria genome) I am downloading SRA data from published RNA-Seq experiments, and converting these to fastq files.
I know there are different ways to do things, and I would appreciate some advice about the best way to move forward.
I started with first looking at the fastq quality. Next I used the BWA tool. I downloaded a reference genome in fasta format, and performed NormalizeFasta (80bp length, truncate at first white space) but I did not verify chromosome identifiers match (Is this necessary for bacterial genomes?). I mapped the fastq single-end reads against this normalized reference genome following a Galaxy tutorial (single-end, SAM/BAM specifications) which produced bam files. I ran these with flagstat and got over 90% reads mapped for all 6 datasets. I then performed htseq using a GTF file of the same reference genome I downloaded from Ensembl (I also tried generating my own GTF file with prokka and gffread). For htseq, I selected the Union overlap mode and -nonunique all option, non-strandedness and with CDS feature type and gene_ID ID attribute (both categories are present in the GTF file). Unfortunately, the htseq table shows 0 for all counts.
Again, appreciate any advice,