Question: Can anyone share a workflow of differential gene expression on multiple conditions (>3) and triplicates?
3.3 years ago by
United States
Hey guys,

I just started using Tophat-cufflink on Galaxy today. But the example I found was quite simple (2 samples comparison) and I got a bit confused. I have the ILLUMINA RNA-seq data from 3 different niches and within each niche there were 3 replicates. I did the de novo assembly using transABbyss as there were no reference genome. 


I named my samples as follows





I mapped the pair-end reads of each sample against the de novo assembly. Then I was not sure if i should use the data from each sample to do the cufflinks assembly or combine the data from the same niche then do the cufflinks assembly (but i combined the data within same niche anyway). 

Next when it came to cuffmerge, I merged all the data across different niches. Later, I did cuffquant analysis using the output of cuffmerge and when it asked for replicates  1,2,3, I input the datasets from same niche as replicates 1,2,3, respectively. Now I have 3 cbx data file corresponding to each niche (I think so). 

I am still waiting for the output of cuffdif, but I am quite uncertain about what I just did. 

Any idea is welcome.


ADD COMMENTlink modified 3.1 years ago by Jennifer Hillman Jackson25k • written 3.3 years ago by yonexhalaolv0
3.1 years ago by
United States
Running cufflinks on each replicate would be a better choice. Then run Cuffmerge on all to create the input GTF dataset for Cuffquant (and Cuffdiff later in the workflow). On the Cuffdiff tool form, you can enter all three as distinct conditions (each with three replicates) or use the Cuffquant output, but be aware that only two conditions will be tested against each other in any single Cuffdiff run when using the Galaxy wrapper (unless the time-course method is employed). 

More about this protocol is available here:

Best, Jen, Galaxy team

ADD COMMENTlink written 3.1 years ago by Jennifer Hillman Jackson25k
