I am analyzing zebrafish RNA seq data, I am able to upload the fastq files, use trimmomatic, align using bowtie and tophat. After that I can select the mapped files (BAM) for RNA analysis but could not select GTF/GFF3 file of the annotated genome from the history for Cuffdiff. I also realized that at this step it asks for GTF generated by Cufflinks, if I do that, it only provides chromosome locations but not the gene names. Am I missing a step or needs to be done in a different way? Thanks...
Hello,
We are not indexing genomes with annotation for RNA-STAR at this time.
For STAR, Tophat, or HISAT2, and most other tools, the reference GTF will need to be loaded into Galaxy by you. It must be an exact match with the reference genome/build used for mapping and all other steps in the same analysis.
This FAQ is for those that already ran into problems, but it contains information about how to load/format custom genomes and some options for obtaining a matching reference annotation dataset (for a custom genome or a built-in database index on the server you are working at): https://galaxyproject.org/support/chrom-identifiers.
iGenomes https://support.illumina.com/sequencing/sequencing_software/igenome.html is a good GTF source that includes all of the attributes utilized by Cuffdiff (including gene names). Some of this is covered in the FAQ above with more details.
The general workflow is to map, run Cufflinks on each BAM result, run Cuffmerge inputting all Cufflinks GTFs plus a public annotation source (such as those from iGenomes), then use the Cuffmerge GTF with Cuffdiff. That same public GTF can be also be used with any of the earlier tools if you want the those to utilize known splice junctions. http://cole-trapnell-lab.github.io/cufflinks/manual/.
The analysis tools you are using are a bit older (Tophat, all Cuff tools) and some are deprecated. If interested in the latest tools/methods for RNA-seq analysis, please see the Galaxy tutorials here: https://galaxyproject.org/learn/
Thanks! Jen, Galaxy team