A Question in ChIP-seq Normalization

Heads up! This is a static archive of our support site. Please go to help.galaxyproject.org if you want to reach the Galaxy community. If you want to search this archive visit the Galaxy Hub search

Latest

Open

RNA-Seq

ChIP-Seq

SNP

Assembly

Forum

Home

Welcome to Galaxy Biostar! User support for Galaxy! about • faq • rss

Log In

Sign Up

Question: A Question in ChIP-seq Normalization

0

4.5 years ago by

abrahamlincon • 10

abrahamlincon • 10 wrote:

Excuse me? A few days ago I read a paper "Histone modification levels are predictive for gene expression" (PNAS (2010), 107, 2926-2931). It proposed a linear model predicting gene expression levels from the combination of different histone modifications, such as H3K4me3, H3K27me3, etc. The predictor variables were of the form log(Nj+aj), where Nj representing the number of tags of modification j in each promoter region (4001bp surrounding TSS), and aj was a pseudocount to make the logarithm be defined when Nj was zero. The authors didn't refered to normalize Nj. But when I read another paper titled "Computational inference of mRNA stability from histone modification and transcriptome profiles" (Nucleic Acids Res (2012), 40(14):6414-23), which also involving a linear model, the authors used the normalized read coverage of histone modification as the predictor variable. These authors said "the read coverage of each histone modification in the 15 regions (read count per bp) was calculated and normalized according to the sequencing library size". The former paper didn't say normalization, but the latter one said to normalize. I'm wondering why there is such a difference and in what condition, a normalization should be performed.

Hope some one to give some help. Thank you so much!

chip-seq • 1.1k views

ADD COMMENT • link •

modified 4.5 years ago by Bjoern Gruening ♦ 5.1k • written 4.5 years ago by abrahamlincon • 10

My 2c: I mostly rely on edgeR which takes raw counts over regions of interest and uses normalisation factors as an offset in the model instead of adjusting the counts directly. Competing popular methods use the library size to adjust fragment counts, so the 'right' answer depends on the specific model and biological question.

If you want advice from more reputable statisticians working on these interesting and important issues, they are more likely found on (eg) the Bioconductor list or perhaps seqanswers than this Galaxy forum - so if you don't get a good answer here, perhaps try there.

ADD REPLY • link modified 4.5 years ago • written 4.5 years ago by fubar ♦ 1.1k

0

4.5 years ago by

Bjoern Gruening ♦ 5.1k

Germany

Bjoern Gruening ♦ 5.1k wrote:

Hi,

I'm not sure about the first paper and why they have not normalized the data. But depending on your experimental settings you should normalize you input data, for example for GC-content or sequencing depth.

If you want to play around with normalization and visualisation have a look at deepTools and the deepTools server.

Hope that helps,

Bjoern

ADD COMMENT • link written 4.5 years ago by Bjoern Gruening ♦ 5.1k

Please log in to add an answer.

Similar posts • Search »

Normalization when convert bam to wig
Hello, I am new to the bioinformatics analysis, I want to display bam coverage in IGV to show the...
Chip Seq analysis with multiple biological replicates for differential expression
Hello, I am very new to sequence data analysis and had some structural questions. I am trying to ...
CuffDiff error on Local Galaxy
HI, I am running local instance of Galaxy on my linux mint system. When I try to run CuffDiff, ...
Homer getDiffExpression.pl edgeR
To complement out cuffdiff analysis we thought to take a parallel route using Homer. to do so we...
I am slightly confused about the input requirements for the differential expression tool - last step of Trinity
Hello all, I have a pressing question... To start with, I have read the Trinity methods paper ...
Cuffdiff 2.2.1.5 error with cummeRbund SQlite. "Fatal error: Exit code 1 () -- use version 2.2.1.3 instead
Hi, so as a disclaimer, I am very new to RNA seq analysis. I received the following error while r...
Batch conversion of ID to gene symbol
I'm an old school molecular biologist who studies gene expression but is quite new to bio-computi...
RNA Structure (in silico) prediction
When I use the RNA Structure Prediction to predict structures from my list of RNA ids, the job fi...
Tophat Mapping And Cufflinks Output Issues
Hello Galaxy Users- I've been using the Main Galaxy server to work on an RNA-Seq project for a n...
same gene with multiple expression from cuffdiff
Hello, I ran tophat-cufflink-cuffdiff of my RNA-seq, and got the cuffdiff "gene differential exp...
Cuffdiff Variance Calculations
Hello forum, I was reading some of the theory behind cuffdiff and how it calculates for DE genes....
Importing Gtf Into Galaxy
Hi, I have been trying to get reference data from the ucsc browser into Galaxy, but when I try ...
DESeq2 p-value histogram
Hi there, Working on an assignment on RNA-seq DE analysis and I've got a couple of questions... ...
normalisation between different ChIP-seq data
Hi there, I guess it is a recurrent problem, but I haven't found a satisfying answer yet. I have ...
RNAseq exon count
Hej, I have an RNAseq dataset (single end) from a mouse model heterozygous for a trapped-gene. Th...
Reads on Y chromosome while doing sequecing with a female donor
Hello, When looking closely to my alignments data I found something interesting. Some of my read...
Mutliple UCSC gene names in RStudio Data Table
Hi there, I am using the cummeRbund package in R Studio on my personal computer to analyze diffe...
Cufflinks Quartile Normalization
Can someone help me understanding the quartile normalization in Cufflinks? I read different threa...

Content

Help

About
FAQ

Access

RSS
Stats
API

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by Biostar version 16.09

Traffic: 183 users visited in the last hour