Dear galaxy users,
This might be a very basic question to most of you. But I was hopimg I
could get better understanding of this concept by asking you all.
How exactly can we accomplish annotation of our reads? The combination
of Tophat and cufflinks does annotate genes right? . I am a bit
confused regarding this topic. Any help will be much appreciated.
The tools TopHat/Cufflinks will map and assemble transcripts from
sequencing reads. This mapping give each component (short read,
transcript, gene boundary) genomic coordinates with respect to the
target reference genome.
Annotation is also mapped to the reference genome by genomic
coordinates. This can be derived from different sources, a look at a
genome browser project that focuses on annotation will help you to
understand the concept. Good choices can be found under the Galaxy
group "Get Data".
One way to merge the two (assign "annotation" to a
"sequence/transcript/gene"), is to identify overlapping coordinate
regions on the reference genome between the two. Please see the tools
the group "Operate on Genomic Intervals" and the associated wiki for
Galaxy is a good resource for this type of analysis.
Another way to obtain annotation is to run annotation algorithms
directly on the sequence data itself. This is a large and varied
analysis space. The public main Galaxy server has some tools for this
type of analysis and more are offered if you decided to run a
local/cloud instance with repositories from the Tool Shed.
For annotation, it is best to know what you are looking for, perform
some searches both within the web tools you prefer and with a search
tool such as Galaxy, use that research to determine the best platform
use the tool, then sort out the technical details. For general
'how-to-use' help with Galaxy, plus some basic scientific operations,
these are good places to get oriented/started: