Question: Cufflink on RNA-seq
What difference betwen reference annotation file and reference genome file on cufflinks? And how to gain reference annotation file on cufflink?

  • Reference annotation = GTF, GFF, BED, Tabular or other = Describes what, and usually where, genomic features are located on a specific genome build. Features can be representative of transcripts, genes, transcription start sites, SNPs, and much (much) more.

  • Reference genome = Fasta = Describes the nucleotide content of a specific genome build (ATCGN). There are protein versions of this type of data but then the answer becomes more complicated. When doing RNA-seq analysis with these tools, you reference genome/transcriptome/exome will be in nucleotide format.

This prior Q&A explains the difference in the context of a specific question including potential sources. Consider the mm10 database (mouse) as an example to be replaced with your target genome.

This troubleshooting help for data mismatch problems between reference annotation/genomes inputs is another good resource. And might be helpful later on. Choosing input data that is all based on the same reference genome build is very important.

Let us know if you have questions that were not covered or are unclear, Jen, Galaxy team

