Question: Analyzing Rna-Seq Replicates Using Cuff(Links/Compare/Diff)
0
David K Crossman • 130 wrote:
Hello!
I have 10 human RNA-Seq samples consisting of 3 groups
(2 replicates per group). I have already run each of them through
TopHat and Cufflinks on the Penn State Galaxy instance. I am now at a
head-scratching moment. I want to use CuffCompare next (in the end I
will want to run CuffDiff so that I can determine the gene/isoform
expression between these 3 groups) but am unsure of the best way to do
this. After reading several Galaxy posts, I've come across a couple
of ideas:
1. Run CuffCompare on two Cufflinks output files. When that is
finished take the CuffCompare output file and run it again in
CuffCompare with the third Cufflinks output file sample. When this is
finished, take that CuffCompare output file and run it again in
CuffCompare with the fourth Cufllinks output file sample, etc... (I
hope you catch my drift as to where this is going). In a nutshell I
will be repeatedly merging Cufflinks outputs in CuffCompare. Then
when all 10 have been put through CuffCompare, then I can run CuffDiff
and set up 3 groups in CuffDiff with their appropriate BAM files from
TopHat.
2. Add all 10 Cufflinks output files in CuffCompare using the
"add new GTF input file" option.
I chose step two because it looked the simplest and
from the posts I read, it sounded like this was a fully functional
option. I was also using a reference annotation file as well (that
file has worked before in the past on non-replicate analyses).
However, I came across an error:
Error running cuffcompare. You are using Cufflinks v1.0.3, which is
the most recent release.
No fasta index found for ./input1. Rebuilding, please wait..
Error: sequence lines in a FASTA record must have the same length!
cuffcompare v1.0.3 (2403)
cuffcompare -o cc_output -r
/galaxy/main_database/files/002/678/dataset_2678888.dat -R -s
./input1 ./input2 ./input3 ./input4 ./input5 ./input6
Any suggestions as to why this is happening? Am I
trying something that shouldn't be attempted yet? Is there a better
alternative to analyzing replicates? Any
suggestions/ideas/workflows/you name it would be greatly appreciated!!
Thanks,
David