Question: StringTie and StringTie merge - when to apply the Guide gff (reference annotation file)?
0
gravatar for sf.rocha
21 days ago by
sf.rocha0
sf.rocha0 wrote:

I'm new to StringTie and I've been trying following the "Finding and quantifying new transcripts" at https://galaxyproject.org/tutorials/nt_rnaseq/. It is not clear to me why it is recommended to only use a guide gff file when merging the StringTie data from all individual samples? What is the difference between doing this or always using the gff file (both when running StringTie and StringTie Merge) or using it only in the initial StringTie run of each sample?

I would appreciate if someone could give me some insight into this. Thank you.

ADD COMMENTlink modified 20 days ago by Jennifer Hillman Jackson25k • written 21 days ago by sf.rocha0
0
gravatar for Jennifer Hillman Jackson
20 days ago by
United States
Jennifer Hillman Jackson25k wrote:

Hello,

  • The original reference GTF contains known transcripts/genes (known).

  • The results from Stringtie include the novel (and presumably at least some knowns, if there are any for your genome) transcripts/genes unless you restrict it to only report knowns from a reference GTF. So, there can be three kinds of results. (only the knowns represented by your reads or those knowns + "guided" novel or those knowns + unguided novel)

  • StringTie merge combines and reformats GTF data. This can have two kinds of content.

    • If given just the original reference GTF to fix up the formatting (often a required first step), the content does not change at all (original known).
    • If given Stringtie output and the fixed up reference GTF together, the content reflects discovery, if any, from your read data (original known + novel).

When to and if use known annotation depends on what your goals are. The tool manual (http://ccb.jhu.edu/software/stringtie/index.shtml?t=manual#de) explains this in much more detail, but in short:

  • If you do not care about novel isoforms/transcripts/genes (discovery), then do not include/create/merge/consider novel data in the analysis.
  • If you do care about novel data, be sure to allow it to be created and not filtered out. Knowns can be used as a guide, or not, depending on if you want those to influence how the data assemble or not.

Thanks, Jen, Galaxy team

ADD COMMENTlink written 20 days ago by Jennifer Hillman Jackson25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 182 users visited in the last hour