Question: Homer getDiffExpression.pl edgeR
0
gravatar for blumroy
2.8 years ago by
blumroy0
blumroy0 wrote:

To complement out cuffdiff analysis we thought to take a parallel route using Homer.

to do so we basically follow this protocol:

makeTagDirectory pairend.sam -format sam -fragLength 250 -removeSpikes 10000 5 -genome /data//projects/mm9/ref.fasta -checkGC -tbp 1 -sspe -flip 

 

Then, we run analyzeRepeats.pl to make read summaries across genes
analyzeRepeats.pl rna mm9 -noadj  -count genes -strand + -d MIa9_1 MIa9_2 > Scramble_vs_Target_analyzeRepeats.txt

We are running this step as recommended by the Homer tutorial without normalization so the following step could use the unnormalized data and perform edgeR analysis on it.

 

Next step we run: 
getDiffExpression.pl Scramble_vs_Target_analyzeRepeats.txt MIa9_1 MIa9_2 -repeats -edgeR > Scramble_vs_Target_analyzeRepeats_edgeR.txt


After testing this command and checking the perl script for getDiffExpression I can tell this:
This last step actually takes unnormalized data and transmits it to edgeR for analysis that aims to identify genes that significantly differ by expression. edgeR uses the original unnormalized data and creates a model for it from which it would derive finally log2FC, p-value, FDR.

Now, the output of getDiffExpression is a little unusual - it composed of two parts: 
The three very right-side columns which are the output from edgeR (Log2FC; p-value, FDR). These are calculated based on pseudeo values in some cases and a model created by edgeR,
BUT - the rest of the table (all the left side) is a matrix that presents now newly normalized data. 
However, unlike what some users I conversed with thought, this is NOT a normalized value that was generated by edgeR, but rather, this normalized data was generated by the Homer/getDiffExpression.pl script. 

This data is only presented in the table for the user to see, but it was NOT actually used as part of the edgeR analysis (unlike what some users may think). 

My issue is that we are currently not sure in what way this normalized data was normalized....

see this link for relevant indications 
http://homer.salk.edu/homer/ngs/analyzeRNA.html

look for explanatory indication such as the following which exactly states what I've described above: 
"This command will basically add 3 columns to the end of your original file per pairwise comparison.  Most important are the log2 fold change and the p-value and FDR values.  The raw read counts will be normalized in this file based on the total reads in each column so that the values are more easily to compare when looking through the results."

However, my attempts to re-run the previous step again using this time the normalization options in order to obtain the normalized values that appear in the output from getDiffExpression  did not prove to be successful...
I tried - 
analyzeRepeats.pl rna mm9 -noadj  -count genes -strand + -d MIa9_1 MIa9_2 -normMatrix 1e6

OR
analyzeRepeats.pl rna mm9 -noadj  -count genes -strand + -d MIa9_1 MIa9_2 -norm 1e7

but the normalized values I obtained were quite different from the ones that were obtained after running the getDiffExpression.pl script... 

We think that it would become important to know what normalization exactly was performed on the data presented in this output along with the edgeR values. 

Will be glad to hear other users ideas on the subject! 
Thanks a lot, 

Roy

 

I

 


 

ADD COMMENTlink modified 2.8 years ago by Jennifer Hillman Jackson25k • written 2.8 years ago by blumroy0
0
gravatar for Jennifer Hillman Jackson
2.8 years ago by
United States
Jennifer Hillman Jackson25k wrote:

Hello,

This is not a Galaxy usage question so I am going to point you to the Homer tool author:

Can't figure something out? Questions, comments, concerns, or other feedback: 
cbenner@salk.edu

Take care, Jen, Galaxy team

ADD COMMENTlink written 2.8 years ago by Jennifer Hillman Jackson25k

Okay, thanks. 

We unfortunately never heard from him back and thought some users may know more details.. 

Roy

ADD REPLYlink written 2.8 years ago by blumroy0

I don't know this one and you could get some replies here from others, but the main Biostar forum (or seqanswers) would be better for this type of question.

The suggestion is to post something like this similar question: https://www.biostars.org/p/173765

(It may be that the internal normalization is the same across Homer tools, but I cannot confirm that - as I have not examined the code or done a complete search).

ADD REPLYlink written 2.8 years ago by Jennifer Hillman Jackson25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 169 users visited in the last hour