Question: Clustering With Cuffcompare Or Cuffdiff Results
0
Zhang Xiaoyu • 10 wrote:
Dear Sir or Madam,
I am planning to do clustering of several libraries based on the
output of cuffcompare or cuffdiff, as they allow me to construct a
matrix whose columns represent the libraries and rows are the count of
transcripts or genes. I want to construct the matrix because it is
the required input format of many RNA-seq clustering softwares, e.g.
baySeq, HTSCluster. However, by reading the answer of question "I want
to find differentially expressed genes. Can I use Cufflinks in
conjunction with count-based differential expression packages?" in the
cufflinks FAQ list, it is suggested not to convert FPKM value to count
data.
Now my question is
1. It seems that it is better to run everything up to cuffdiff, but
does cuffdiff allow multiple sample comparison because I read
somewhere that even for multi-samples it still compare tham
pairwisely? In a sense, because I want to do clustering which needs
some quantitative data source to do the merging, will cuffdiff provide
me some quantitative measures rather than the test score and p-value
which is too qualitative to include?
2. If I really need to get count data from the FPKM values, how do I
obtain the mentioned "effective length"? Would it be better if I treat
each assembled transcript as an object in clustering, rather than
genes. What does it mean "you'd be throwing away Cufflinks'
uncertainty" even with using isoforms as objects? How should I include
the uncertainty into my clustering?
Best,
Sherry
ADD COMMENT
• link
•
modified 6.8 years ago
by
Jeremy Goecks • 2.2k
•
written
6.8 years ago by
Zhang Xiaoyu • 10