Question: normalisation between different ChIP-seq data
gravatar for k.witmer
18 months ago by
k.witmer10 wrote:

Hi there, I guess it is a recurrent problem, but I haven't found a satisfying answer yet. I have 4 ChIP-seq datasets looking at H3K9ac coverage, each from a different life cycle stage (or different conditions I guess, with the difference that there is not really a "non-treated" or wt sample). I would now like to monitor how the H3K9ac mark changes across the different stages (conditions). I know how to normalise within each sample, for this I have been using deeptools2 suite to normalise the IP/input using the bamcompare mode to report back log2 ratios for each of the 4 datasets. What I am currently stuck with is that I don't really know how to compare the samples to each other. I saw that there is the "diffbind" option, however it is not working at the moment ( Would the bigwig compare mode be something I could use here?

Below is a summary of what I have done, but I am unsure if this ok like this: What I have done so far is use MACS2 to call broad peaks for each sample (the only option available for my species, which I had to provide as a custom build genome). This worked out quite nicely, although of course, the peaks do not always overlap between all the samples. To get a way around this, I "merged" all the called peaks together in a Masterfile (using bed intersect). Logically, these "merged peaks" are broader than the initial peaks, so I then divided them up into smaller peaks again with a cut off bp length. The bp length used for cut off was the mean of the median peak length of the initially called peaks for all samples (let's call them uberpeaks). This all worked out quite fine, the uberpeaks overlap quite nicely with the 5UTR (or 3UTR) of a gene, although of course not always. This approach left me with two problems: 1. To assign the uberpeak back to a region and 2. To assign the uberpeak to a sample. For #2, I used bed intersect again, where I said if 70% of the uberpeak overlaps with the intial called peak of a samples, then return this peak for this sample. This worked out quite ok. When doing the same thing for #1, I lose quite a lot of the uberpeaks, as they do not always overlap well with a region. I hope I got the point across, please let me know if this is unclear. Any help greatly appreciated!

ADD COMMENTlink written 18 months ago by k.witmer10
gravatar for Devon Ryan
18 months ago by
Devon Ryan1.9k
Devon Ryan1.9k wrote:

What's the end goal? I suspect that you might like to use computeMatrix and plotHeatmap from deepTools to help generate some hypotheses before moving on to using something like diffbind.

As an aside, you'll have better luck at the moment using diffBind locally and just downloading the files from Galaxy. The wrapper in Galaxy is "problematic" and likely to get rewritten in the not too distant future (the diffBind authors have expressed some interest privately in doing so).

ADD COMMENTlink written 18 months ago by Devon Ryan1.9k

Hi, yes I have been using these tools to get a better idea of the distribution of the mark and how it correlates with already published transcriptional data. The end goal would be to get a list of genes that are enriched in H3K9ac for each condition/stage. Enrichment in terms of 5UTR, ORF, or 3UTR (the genome is quite compact and I have all the input files providing this information for each gene already).

Thanks for suggesting diffBind locally, however just to clarify, do you mean to download the package and run it in R?

Thanks again!

ADD REPLYlink written 18 months ago by k.witmer10

You'll have to install diffBind in R and download the BAM files and ├╝berpeaks from Galaxy.

ADD REPLYlink written 18 months ago by Devon Ryan1.9k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 172 users visited in the last hour