3.1 years ago by
United States
Hello,
The specific transcripts and genes included in the reference annotation influence how these tools cluster the data. The content can be different between annotation sources based on the rules applied to build the transcripts, cluster them into genes, and/or the attributes in the file itself. Key attributes: 1) minimally the presence of tss_id and p_id and ideally gene_name 2) transcripts actually clustered into genes - if transcript_id and gene_id are the same value, then there is no gene clustering.
The best annotation file available for your target genome is one that contains most or all of these attributes. If there are several such files available, then review how the transcripts and genes are constructed and decide which is the best match for your experiment. Some sources have stricter rules than others and some contain predictions (which may or may not be desirable).
iGenomes GTF datasets are an example of annotation files with all of the attributes Cuffdiff can utilize.
Best, Jen, Galaxy team