I am doing RNA-seq analysis for several mouse samples and I encounter problems during differential expression analysis. First, I used Galaxy tools to clean,filter, and trim my reads and Tophat for alignment. I selected the built-in genome mm10 for alignment and the mapping efficient is above 85%. Then, I used cufflinks without using reference annotation and I continued to do cuffmerge with the ensembl Mus_musculus.GRCm38.79.gtf annotation, followed by Cuffdiff. The results showed zero in values, fold changes, test_stat for every gene. I downloaded the BAM file and analyzed it with HTSeq-count using the Mus_musculus.GRCm38.79.gtf, and I found that all the reads had no feature or ambiguous alignment. I was wondering if I have used the wrong mapping genome or annotation file. Specifically, I have the following questions:
1. Is the built-in mouse genome in galaxy obtained from UCSC?
2. Shall I use the UCSC gff file for Cuffmerge annotation? How can I download gff file from UCSC?
3. If I want to use ensembl annotation, should I map my reads to the ensembl genome?
4. In the RNA-seq tutorial for Galaxy, reference annotation is not used during Cufflinks but used for Cuffmerge. Is this the correct method for DE analysis in my case?
5. I split the reads into 8-10 smaller FASTQ files before alignment using Tophat. Afterwards, I merged all the resulting BAM files to a single BAM file and converted it to SAM. However, this SAM file is not able to be processed with HTSeq-count. But the smaller BAM files directly resulting Tophat work well with HTSeq-count. What is the problem?
Thank you very much for your attentions and I look forward to the answers!!