Excuse me? A few days ago I read a paper "Histone modification levels are predictive for gene expression" (PNAS (2010), 107, 2926-2931). It proposed a linear model predicting gene expression levels from the combination of different histone modifications, such as H3K4me3, H3K27me3, etc. The predictor variables were of the form log(Nj+aj), where Nj representing the number of tags of modification j in each promoter region (4001bp surrounding TSS), and aj was a pseudocount to make the logarithm be defined when Nj was zero. The authors didn't refered to normalize Nj. But when I read another paper titled "Computational inference of mRNA stability from histone modification and transcriptome profiles" (Nucleic Acids Res (2012), 40(14):6414-23), which also involving a linear model, the authors used the normalized read coverage of histone modification as the predictor variable. These authors said "the read coverage of each histone modification in the 15 regions (read count per bp) was calculated and normalized according to the sequencing library size". The former paper didn't say normalization, but the latter one said to normalize. I'm wondering why there is such a difference and in what condition, a normalization should be performed.
Hope some one to give some help. Thank you so much!