Question: Cuffdiff: selecting replicates in a different order gives me different numbers of differentially expressed genes. Why?
11 months ago by
cecelia.kelly0 wrote:

I have two groups with four biological replicates per group. I tried comparing group 1 to group 2 using cuffdiff. I got 544 significantly differentially expressed genes as my result. My lab mate ran the exact same differential expression analysis with the exact same settings and got ~1500 differentially expressed genes. The only difference between our cuffdiff runs was the order in which the bam files were selected under the "Condition" fields. To confirm that the order was what is causing the difference in our results. My lab mate ran cuffdiff using the exact same order of bam files as me, and got 544 significantly differentially expressed genes. We compared log2 fold change, and found these values to be virtually identical when the order is identical, but not when the order is not identical, even in overlapping genes between the two different results.

My question is why does the order of bam file (which should be biological replicates) cause such a big difference in the number of differentially expressed genes? Is there something we can do to minimize these differences?

Here is a screenshot of what I mean by "bam file order" : So basically the order in that picture is JD10, S14, JD11, S15 for the first condition, and JD8, S10, S9, JD9 for the second condition. If I switch it up to S14, JD10, JD11, S15 and S10, JD8, S9 JD9 for example, I get completely different results.

ADD COMMENTlink modified 10 months ago by Jennifer Hillman Jackson25k • written 11 months ago by cecelia.kelly0
10 months ago by
Jennifer Hillman Jackson25k wrote:


Only the first two inputs are directly compared per-Cuffdiff run in the differential expression output. The differences in the output are based on the first two inputs being different between runs. This is how Cuffdiff works.

There are other workflow options. Htseq-count paired with DESeq2 is one choice.

Reference Galaxy tutorials (RNA-seq is covered):

ADD COMMENTlink written 10 months ago by Jennifer Hillman Jackson25k

Hi Jennifer, Thanks for your response. Could you clarify what you mean by "the first two inputs"? Do you mean the first input of each group or the first two inputs of each group are directly compared? Also, if only the first two inputs are compared, what is the use of inputting additional replicates into a single run?

ADD REPLYlink written 10 months ago by cecelia.kelly0
