Question: Cuffdiff: selecting replicates in a different order gives me different numbers of differentially expressed genes. Why?
gravatar for cecelia.kelly
15 months ago by
cecelia.kelly0 wrote:

I have two groups with four biological replicates per group. I tried comparing group 1 to group 2 using cuffdiff. I got 544 significantly differentially expressed genes as my result. My lab mate ran the exact same differential expression analysis with the exact same settings and got ~1500 differentially expressed genes. The only difference between our cuffdiff runs was the order in which the bam files were selected under the "Condition" fields. To confirm that the order was what is causing the difference in our results. My lab mate ran cuffdiff using the exact same order of bam files as me, and got 544 significantly differentially expressed genes. We compared log2 fold change, and found these values to be virtually identical when the order is identical, but not when the order is not identical, even in overlapping genes between the two different results.

My question is why does the order of bam file (which should be biological replicates) cause such a big difference in the number of differentially expressed genes? Is there something we can do to minimize these differences?

Here is a screenshot of what I mean by "bam file order" : So basically the order in that picture is JD10, S14, JD11, S15 for the first condition, and JD8, S10, S9, JD9 for the second condition. If I switch it up to S14, JD10, JD11, S15 and S10, JD8, S9 JD9 for example, I get completely different results.

tophat rnaseq cuffdiff • 975 views
ADD COMMENTlink modified 15 months ago by Jennifer Hillman Jackson25k • written 15 months ago by cecelia.kelly0
gravatar for Jennifer Hillman Jackson
15 months ago by
United States
Jennifer Hillman Jackson25k wrote:


Only the first two inputs are directly compared per-Cuffdiff run in the differential expression output. The differences in the output are based on the first two inputs being different between runs. This is how Cuffdiff works.

There are other workflow options. Htseq-count paired with DESeq2 is one choice.

Reference Galaxy tutorials (RNA-seq is covered):

ADD COMMENTlink written 15 months ago by Jennifer Hillman Jackson25k

Hi Jennifer, Thanks for your response. Could you clarify what you mean by "the first two inputs"? Do you mean the first input of each group or the first two inputs of each group are directly compared? Also, if only the first two inputs are compared, what is the use of inputting additional replicates into a single run?

ADD REPLYlink written 15 months ago by cecelia.kelly0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 175 users visited in the last hour