I want to do differential gene expression analysis on some Nile Tilapia RNA-Seq data using the Cufflinks-Cuffdiff method. I aligned the sequence reads to the UCSC Nile Tilapia genome with >80% mapping efficiency. I downloaded the annotation GTF file from Ensembl and converted it using the "Make ensembl GTP compatible with Cufflinks" work flow. Then I ran cufflinks on the paired-end mapped data with either the original and converted GTF file. However, the counts were 0 for all gene and transcript expression. I noticed the the chromosome names of the Tilapia gtf file are strange. They are "GL831133.1" instead of 1,2,3. I also got zero readings using HTseq-count using either the original and converted GTF file. Does anyone know a good reference annotation file for Tilapia? Thanks!
This workflow does not convert identifiers automatically in the correct format for all genomes. I can't see your pics, but the chromosome names do need to be a match between the reference genome and reference annotation. https://wiki.galaxyproject.org/Support#Reference_genomes
You could always download the reference genome from Ensembl and use that as a Custom Reference genome/build. Here is how: https://wiki.galaxyproject.org/Support#Custom_reference_genome
I am not aware of one that has all of the key attributes: tss_id, gene_id, gene_name, but others are welcome to comment.
Jen, Galaxy team