What difference betwen reference annotation file and reference genome file on cufflinks? And how to gain reference annotation file on cufflink?
Hello,
Reference annotation = GTF, GFF, BED, Tabular or other = Describes what, and usually where, genomic features are located on a specific genome build. Features can be representative of transcripts, genes, transcription start sites, SNPs, and much (much) more.
Reference genome = Fasta = Describes the nucleotide content of a specific genome build (ATCGN). There are protein versions of this type of data but then the answer becomes more complicated. When doing RNA-seq analysis with these tools, you reference genome/transcriptome/exome will be in nucleotide format.
This prior Q&A explains the difference in the context of a specific question including potential sources. Consider the mm10 database (mouse) as an example to be replaced with your target genome. https://biostar.usegalaxy.org/p/22862
This troubleshooting help for data mismatch problems between reference annotation/genomes inputs is another good resource. And might be helpful later on. Choosing input data that is all based on the same reference genome build is very important. https://galaxyproject.org/support/chrom-identifiers/
Let us know if you have questions that were not covered or are unclear, Jen, Galaxy team