Question: Chip Seq analysis with multiple biological replicates for differential expression
2
gravatar for sisiliuiuc
4.1 years ago by
sisiliuiuc20
United States
sisiliuiuc20 wrote:

Hello, I am very new to sequence data analysis and had some structural questions. I am trying to analyze the difference in the presence of an epigenetic mark  between a control group and a treatment group. The animals model we use are mice. I have 3 biological replicates from each of the two groups and I am trying to find the different in the presence of this mark. I'm good up to the point of alignment, I've aligned all 6 samples to the mouse genome using bowtie on galaxy. but I'm stuck on how to peak call. I know people usually use an control/input sample for chip-seq where they don't do the IP and just sequence to account for the background noise but we didn't do a non IP control. Here's are my thoughts on how to approach this and the options I came across. Please let me know which one is the most reasonable approach, it would be wonderful if there are references to papers or protocols. 

option 1: All 6 mice are age matched and same sex. Use the 3 control mice and randomly assign them to the 3  treatment mice and peak call using Control mice as input. I would end up with 3 files of different peaks, then I would find the peaks present in all 3 files of differential peaks. I'm not sure which tool to use, maybe "intersect" under "operate on genome variables". 

option 2: combine the mapped reads into 1 file for each of the control and treatment group so I will have two files one control and one treatment. I'm not sure not to combine them in galaxy, there are a few options to concatenate, join or merge, any suggestions to which to use would be helpful. Then i would peak call using these two files using control as input and treatment as target. 

Any suggestions would be helpful, if there are galaxy protocols, that would also be extremely great. Thank you. 

ADD COMMENTlink modified 2.9 years ago by hamed.metalgear0 • written 4.1 years ago by sisiliuiuc20
2
gravatar for Jennifer Hillman Jackson
3.6 years ago by
United States
Jennifer Hillman Jackson25k wrote:

Hello,

This question was missed, apologies. But if you are still looking for help, I can offer some general advice.

Both options described may be problematic.

For option1, using a different control for each tag could potentially make data interpretation complicated, since the background for each is different. For option2, the value of multiple control/treatment groups will be diluted - meaning, if one of the controls or treatment samples produces outlier results for some reason, it will be difficult to detect and trace back to the exact input. Instead, I would recommend running the jobs as a matrix and then comparing (a variation on option1). 

Matrix example:

  1. nomenclature:
  2. c1 = control1, c2 = control2, c3 = control3
  3. t1 = tag1 (treatment1), t2 = tag2, t3 = tag3
  4. map all datasets independently using Bowtie2. label these to keep track of the data (by name or tag or annotation)
  5. filter the BAM results by genomic region, if you know where the mark should be located on the reference genome. Use the tool SAMtool -> Slice BAM
  6. execute peak calling "all vs all" - nine runs
  7. detail run1-3: c1 & t1, c1 & t2, c1 & t3
  8. then do the same for c2 and c3
  9. this will produce 9 peak calling results, which can then be compared for common peaks and downstream analysis
  10. to compare, tools in the group "Operate on Genomic Intervals" and "BEDTools" will be the most useful to start with

Hopefully this helps, if you have not already identified a published protocol to follow. Jen, Galaxy team

 

ADD COMMENTlink written 3.6 years ago by Jennifer Hillman Jackson25k

Hello Jennifer, So I have the same chip-Seq data. 3 replicates for two treatment. I did the same things that you suggested. Make matrix from 9 peaks calling result. So for report our result I should go for the peaks that common in all of 9 peak set or just a few of them is fine? Thanks Maryam

ADD REPLYlink modified 21 months ago • written 21 months ago by maryamforoozani160
0
gravatar for troublezhang
3.5 years ago by
United States
troublezhang0 wrote:

There are a couple of software tools that can analyze ChIP-Seq data with biological replicates. Examples include PePr(http://www.ncbi.nlm.nih.gov/pubmed/24894502) and diffReps(http://www.ncbi.nlm.nih.gov/pubmed/23762400). 

Particularly, in PePr's manuscript, the authors talked about the limitations of the option 1 and option 2 approaches that you mentioned, and compared them to PePr's approach, which uses a negative binomial distribution to capture the variances between replicates and call consistent peaks. And it showed that PePr performed better than either of the options. PePr could work for differential binding without input samples. 

The downside is that neither PePr nor diffReps is available on Galaxy. You'll have to install and run it on your own server. 

ADD COMMENTlink written 3.5 years ago by troublezhang0
0
gravatar for hamed.metalgear
2.9 years ago by
hamed.metalgear0 wrote:

I came across a recent tool, http://mspc.codeplex.com  (or http://bioinformatics.oxfordjournals.org/content/31/17/2761) that address both your options and more importantly, discriminates between biological and technical replicates. 

ADD COMMENTlink modified 2.9 years ago • written 2.9 years ago by hamed.metalgear0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 182 users visited in the last hour