Question: A Question in ChIP-seq Normalization
gravatar for abrahamlincon
4.5 years ago by
abrahamlincon10 wrote:

Excuse me? A few days ago I read a paper "Histone modification levels are predictive for gene expression" (PNAS (2010), 107, 2926-2931). It proposed a linear model predicting gene expression levels from the combination of different histone modifications, such as H3K4me3, H3K27me3, etc. The predictor variables were of the form log(Nj+aj), where Nj representing the number of tags of modification j in each promoter region (4001bp surrounding TSS), and aj was a pseudocount to make the logarithm be defined when Nj was zero. The authors didn't refered to normalize Nj. But when I read another paper titled "Computational inference of mRNA stability from histone modification and transcriptome profiles" (Nucleic Acids Res (2012), 40(14):6414-23), which also involving a linear model, the authors used the normalized read coverage of histone modification as the predictor variable. These authors said "the read coverage of each histone modification in the 15 regions (read count per bp) was calculated and normalized according to the sequencing library size". The former paper didn't say normalization, but the latter one said to normalize. I'm wondering why there is such a difference and in what condition, a normalization should be performed.

Hope some one to give some help. Thank you so much!

chip-seq • 1.1k views
ADD COMMENTlink modified 4.5 years ago by Bjoern Gruening5.1k • written 4.5 years ago by abrahamlincon10

My 2c: I mostly rely on edgeR which takes raw counts over regions of interest and uses normalisation factors as an offset in the model instead of adjusting the counts directly. Competing popular methods use the library size to adjust fragment counts, so the 'right' answer depends on the specific model and biological question.

If you want advice from more reputable statisticians working on these interesting and important issues, they are more likely found on (eg) the Bioconductor list or perhaps seqanswers than this Galaxy forum - so if you don't get a good answer here, perhaps try there.

ADD REPLYlink modified 4.5 years ago • written 4.5 years ago by fubar1.1k
gravatar for Bjoern Gruening
4.5 years ago by
Bjoern Gruening5.1k
Bjoern Gruening5.1k wrote:


I'm not sure about the first paper and why they have not normalized the data. But depending on your experimental settings you should normalize you input data, for example for GC-content or sequencing depth.

If you want to play around with normalization and visualisation have a look at deepTools and the deepTools server.

Hope that helps,



ADD COMMENTlink written 4.5 years ago by Bjoern Gruening5.1k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 183 users visited in the last hour