StringTie and StringTie merge - when to apply the Guide gff (reference annotation file)?

Question: StringTie and StringTie merge - when to apply the Guide gff (reference annotation file)?

21 days ago by

sf.rocha • 0 wrote:

I'm new to StringTie and I've been trying following the "Finding and quantifying new transcripts" at https://galaxyproject.org/tutorials/nt_rnaseq/. It is not clear to me why it is recommended to only use a guide gff file when merging the StringTie data from all individual samples? What is the difference between doing this or always using the gff file (both when running StringTie and StringTie Merge) or using it only in the initial StringTie run of each sample?

I would appreciate if someone could give me some insight into this. Thank you.

stringtie merge stringtie ref_seq • 80 views

ADD COMMENT • link •

modified 20 days ago by Jennifer Hillman Jackson ♦ 25k • written 21 days ago by sf.rocha • 0

20 days ago by

Jennifer Hillman Jackson ♦ 25k

United States

Jennifer Hillman Jackson ♦ 25k wrote:

Hello,

The original reference GTF contains known transcripts/genes (known).
The results from Stringtie include the novel (and presumably at least some knowns, if there are any for your genome) transcripts/genes unless you restrict it to only report knowns from a reference GTF. So, there can be three kinds of results. (only the knowns represented by your reads or those knowns + "guided" novel or those knowns + unguided novel)
StringTie merge combines and reformats GTF data. This can have two kinds of content.
- If given just the original reference GTF to fix up the formatting (often a required first step), the content does not change at all (original known).
- If given Stringtie output and the fixed up reference GTF together, the content reflects discovery, if any, from your read data (original known + novel).

When to and if use known annotation depends on what your goals are. The tool manual (http://ccb.jhu.edu/software/stringtie/index.shtml?t=manual#de) explains this in much more detail, but in short:

If you do not care about novel isoforms/transcripts/genes (discovery), then do not include/create/merge/consider novel data in the analysis.
If you do care about novel data, be sure to allow it to be created and not filtered out. Knowns can be used as a guide, or not, depending on if you want those to influence how the data assemble or not.

Thanks, Jen, Galaxy team

ADD COMMENT • link written 20 days ago by Jennifer Hillman Jackson ♦ 25k

Similar posts • Search »