Hi, JiWen
I thought about this before, here is the answer from Cole Trapnell
from
Seqanswer website:
"
I can shed some light on this. We have an upcoming protocol paper that
describes our recommended workflow for TopHat and Cufflinks that
discusses
some of these issues.
As turnersd outlined, there are three strategies:
1) merge bams and assemble in a single run of Cufflinks
2) assemble each bam and cuffcompare them to get a combined.gtf
3) assemble each bam and cuffmerge them to get a merged.gtf
All three options work a little differently depending on whether
you're
also trying to integrate reference transcripts from UCSC or another
annotation source.
#1 is quite different from #2 and #3, so I'll discuss its pros and
cons
first. The advantage here is simplicity of workflow. It's one
Cufflinks
run, so no need to worry about the details of the other programs. As
turnersd mentions, you might also think this maximizes the accuracy of
the
resulting assembly, and that might be the case, but it also might not
(for
technical reasons that I don't want to get into right now). The
disadvantage of this approach is that your computer might not be
powerful
enough to run it. More data and more isoforms means substantially more
memory and running time. I haven't actually tried this on something
like
the human body map, but I would be very impressed and surprised if
Cufflinks can deal with all of that on a machine owned by mere
mortals.
#2 and #3 are very similar - both are designed to gracefully merge
full-length and partial transcript assemblies without ever merging
transfrags that disagree on splicing structure. Consider two
transfrags, A
and B, each with a couple exons. If A and B overlap, and they don't
disagree on splicing structure, we can (and according to Cufflinks'
assembly philosophy, we should) merge them. The difference between
Cuffcompare and Cuffmerge is that Cuffcompare will only merge them if
A is
"contained" in B, or vice versa. That is, only if one of the
transfrags is
essentially redundant. Otherwise, they both get included. Cuffmerge on
the
other hand, will merge them if they overlap, and agree on splicing,
and are
in the same orientiation. As turnersd noted, this is done by
converting the
transfrags into SAM alignments and running Cufflinks on them.
The other thing that distinguishes these two options is how they deal
with
a reference annotation. You can read on our website how the Cufflinks
Reference Annotation Based Transcript assembler (RABT) works.
Cuffcompare
doesn't do any RABT assembly, it just includes the reference
annotation in
the combined.gtf and discards partial transfrags that are contained
and
compatible with the reference. Cuffmerge actually runs RABT when you
provide a reference, and this happens during the step where transfrags
are
converted into SAM alignments and assembled. We do this to improve
quantification accuracy and reduce errors downstream. I should also
say
that Cuffmerge runs cuffcompare in order annotate the merged assembly
with
certain helpful features for use later on.
So we recommend #3 for a number of reasons, because it is the closest
in
spirit to #1 while still being reasonably fast. For reasons that I
don't
want to get into here (pretty arcane details about the Cufflinks
assembler)
I also feel that option #3 is actually the most accurate in most
experimental settings. "
Hope this helps.
Wei Liao
Research Scientist,
Brentwood Biomedical Research Institute
16111 Plummer St.
Bldg 7, Rm D-122
North Hills, CA 91343
818-891-7711 ext 7645
Wei Liao
Research Scientist,
Brentwood Biomedical Research Institute
16111 Plummer St.
Bldg 7, Rm D-122
North Hills, CA 91343
818-891-7711 ext 7645