I was hoping that someone might be able to give me some advice on adjusting or controlling for batch effects in RNA-Seq data?
I have performed principle component analysis (PCA) and hierarchical clustering analysis on normalised expression data (Cuffnorm) from an RNA-Seq experiment. During the experiment, cDNA libraries were pooled together into two groups for RNA sequencing. Based on the results of the PCA, the experimental replicates are clustering into two groups and these appear to correspond to the two pools of cDNA libraries (and not due to experimental treatments as these are split across both clusters equally) - therefore I think there are batch effects within my dataset.
I have already performed differential gene expression analysis using CuffDiff in Galaxy, however I would now like to repeat this analysis after controlling for this batch effect to see whether the previously observed differential expression was due to this batch effect... is this possible and if so how could I go about doing this??
My previous analysis involved Tophat mapping to the cow genome, Cufflinks to assemble transcripts, Cuffmerge to create a master transcriptome of all assembled transcripts for each treatment group, and then CuffDiff to analyse differential gene expression with each treatment group compared to a control group. Is it possible to control for batch effects within this analysis pipeline?
Any advice on this would be gratefully appreciated!