Chip Seq analysis with multiple biological replicates for differential expression

Question: Chip Seq analysis with multiple biological replicates for differential expression

4.1 years ago by

United States

sisiliuiuc • 20 wrote:

Hello, I am very new to sequence data analysis and had some structural questions. I am trying to analyze the difference in the presence of an epigenetic mark between a control group and a treatment group. The animals model we use are mice. I have 3 biological replicates from each of the two groups and I am trying to find the different in the presence of this mark. I'm good up to the point of alignment, I've aligned all 6 samples to the mouse genome using bowtie on galaxy. but I'm stuck on how to peak call. I know people usually use an control/input sample for chip-seq where they don't do the IP and just sequence to account for the background noise but we didn't do a non IP control. Here's are my thoughts on how to approach this and the options I came across. Please let me know which one is the most reasonable approach, it would be wonderful if there are references to papers or protocols.

option 1: All 6 mice are age matched and same sex. Use the 3 control mice and randomly assign them to the 3 treatment mice and peak call using Control mice as input. I would end up with 3 files of different peaks, then I would find the peaks present in all 3 files of differential peaks. I'm not sure which tool to use, maybe "intersect" under "operate on genome variables".

option 2: combine the mapped reads into 1 file for each of the control and treatment group so I will have two files one control and one treatment. I'm not sure not to combine them in galaxy, there are a few options to concatenate, join or merge, any suggestions to which to use would be helpful. Then i would peak call using these two files using control as input and treatment as target.

Any suggestions would be helpful, if there are galaxy protocols, that would also be extremely great. Thank you.

biological replicates galaxy chip-seq • 6.1k views

ADD COMMENT • link •

modified 2.9 years ago by hamed.metalgear • 0 • written 4.1 years ago by sisiliuiuc • 20

3.6 years ago by

Jennifer Hillman Jackson ♦ 25k

United States

Jennifer Hillman Jackson ♦ 25k wrote:

Hello,

This question was missed, apologies. But if you are still looking for help, I can offer some general advice.

Both options described may be problematic.

For option1, using a different control for each tag could potentially make data interpretation complicated, since the background for each is different. For option2, the value of multiple control/treatment groups will be diluted - meaning, if one of the controls or treatment samples produces outlier results for some reason, it will be difficult to detect and trace back to the exact input. Instead, I would recommend running the jobs as a matrix and then comparing (a variation on option1).

Matrix example:

nomenclature:
c1 = control1, c2 = control2, c3 = control3
t1 = tag1 (treatment1), t2 = tag2, t3 = tag3
map all datasets independently using Bowtie2. label these to keep track of the data (by name or tag or annotation)
filter the BAM results by genomic region, if you know where the mark should be located on the reference genome. Use the tool SAMtool -> Slice BAM
execute peak calling "all vs all" - nine runs
detail run1-3: c1 & t1, c1 & t2, c1 & t3
then do the same for c2 and c3
this will produce 9 peak calling results, which can then be compared for common peaks and downstream analysis
to compare, tools in the group "Operate on Genomic Intervals" and "BEDTools" will be the most useful to start with

Hopefully this helps, if you have not already identified a published protocol to follow. Jen, Galaxy team

ADD COMMENT • link written 3.6 years ago by Jennifer Hillman Jackson ♦ 25k

Hello Jennifer, So I have the same chip-Seq data. 3 replicates for two treatment. I did the same things that you suggested. Make matrix from 9 peaks calling result. So for report our result I should go for the peaks that common in all of 9 peak set or just a few of them is fine? Thanks Maryam

ADD REPLY • link modified 21 months ago • written 21 months ago by maryamforoozani16 • 0

3.5 years ago by

troublezhang • 0

United States

troublezhang • 0 wrote:

There are a couple of software tools that can analyze ChIP-Seq data with biological replicates. Examples include PePr(http://www.ncbi.nlm.nih.gov/pubmed/24894502) and diffReps(http://www.ncbi.nlm.nih.gov/pubmed/23762400).

Particularly, in PePr's manuscript, the authors talked about the limitations of the option 1 and option 2 approaches that you mentioned, and compared them to PePr's approach, which uses a negative binomial distribution to capture the variances between replicates and call consistent peaks. And it showed that PePr performed better than either of the options. PePr could work for differential binding without input samples.

The downside is that neither PePr nor diffReps is available on Galaxy. You'll have to install and run it on your own server.

ADD COMMENT • link written 3.5 years ago by troublezhang • 0

2.9 years ago by

hamed.metalgear • 0

hamed.metalgear • 0 wrote:

I came across a recent tool, http://mspc.codeplex.com (or http://bioinformatics.oxfordjournals.org/content/31/17/2761) that address both your options and more importantly, discriminates between biological and technical replicates.

ADD COMMENT • link modified 2.9 years ago • written 2.9 years ago by hamed.metalgear • 0

Please log in to add an answer.

Similar posts • Search »