20 days ago by
United States
Hello,
The original reference GTF contains known transcripts/genes (known).
The results from Stringtie include the novel (and presumably at least some knowns, if there are any for your genome) transcripts/genes unless you restrict it to only report knowns from a reference GTF. So, there can be three kinds of results. (only the knowns represented by your reads or those knowns + "guided" novel or those knowns + unguided novel)
StringTie merge combines and reformats GTF data. This can have two kinds of content.
- If given just the original reference GTF to fix up the formatting (often a required first step), the content does not change at all (original known).
- If given Stringtie output and the fixed up reference GTF together, the content reflects discovery, if any, from your read data (original known + novel).
When to and if use known annotation depends on what your goals are. The tool manual (http://ccb.jhu.edu/software/stringtie/index.shtml?t=manual#de) explains this in much more detail, but in short:
- If you do not care about novel isoforms/transcripts/genes (discovery), then do not include/create/merge/consider novel data in the analysis.
- If you do care about novel data, be sure to allow it to be created and not filtered out. Knowns can be used as a guide, or not, depending on if you want those to influence how the data assemble or not.
Thanks, Jen, Galaxy team