3.8 years ago by
United States
Hello,
A reference annotation dataset is probably best obtained from a data provider, if one can be identified. And you can run the pipeline without annotation. The results would just reflect the content of your NGS sequence inputs and certain features of tools (such as Cuffdiff) would not be utilized. More about the annotation features used by these tools can be found at the Cufflinks web site: http://cole-trapnell-lab.github.io/cufflinks/
This genome is hosted in the UCSC Archaeal Genome Browser http://archaea.ucsc.edu. The availability and type of annotation varies by strain. Also review the "Resources" tab, one of these research groups may have the annotation data you want, in GTF or GFF3 format. There are almost certainly other options. Reviewing publications is probably a good place to start (to gain insight into what others performing similar analysis are using).
If you do decide to use a reference annotation dataset, be sure to use the same exact reference genome that it is based on for your analysis. This may mean creating a new Custom Genome. The sequence identifiers, content, and lengths must be exact between all inputs, meaning created from the same build and using the same nomenclature.
Best, Jen, Galaxy team