6.5 years ago by
There is no single tool do to this operation (although there has been
some discussion about including one in the Tool Shed), but the same
information can be obtained by using a combination of existing tools.
First, start by converting both starting datasets to interval format.
- for TopHat output, "NGS: SAM Tools -> Convert SAM to interval"
- for GFF file (convert to tabular if necessary), subtract "1"
from the start position's value using tool "Text Manipulation ->
- cut columns chrom, new start, stop, strand, name, and score from
this result file using "Text Manipulation -> Cut"
- set the data type to "interval" using the 'Edit attributes form
Next, use a tool in the group "Operate on Genomic Intervals" to
these intervals for overlap. The tool "Cluster" with the option "Find"
is mostly likely the one you will want to use.
As a final step, summarize the data by feature using the tool "Join,
Subtract and Group -> Group".
Hopefully this helps,