Question: p values don't match up
gravatar for mariel.voutounou
4.1 years ago by
United States
mariel.voutounou0 wrote:

A college of mine run an RNA-seq analysis using Galaxy a couple of years ago.

I used the same datasets to run a new analysis - and even though the list of genes generated was identical, I found discrepancies in the number of genes detected as significant (his dataset had more than a thousand significant hits, while mine had a handful).

When I looked closer to each dataset I discovered that the values were different in the two sets while p values and q values were recorded as ''0'' for the older dataset. I was wondering why this is the case.

For example:  

OLD DATASET: chrom location: Sample1: Sample2: status value1 value2 log2 fold change test-stat p_value q_value significance
Npy chr6:49822728-49829505 WT NaV1.8 DRG V600E NaV1.8 DRG OK 0.367901 692.945 10.8792 -12.1069 0 0 yes
NEW DATASET:                      
Npy chr6:49822728-49829505 WT V600E OK 0.497657 878.635 10.7859 15.1569 0.0268 0.642657 no
rna-seq • 955 views
ADD COMMENTlink modified 4.1 years ago by Jennifer Hillman Jackson25k • written 4.1 years ago by mariel.voutounou0
gravatar for Jennifer Hillman Jackson
4.1 years ago by
United States
Jennifer Hillman Jackson25k wrote:


This is most likely due to a change in the underlying tool versions or possibly a parameter change as well. Most tools in the pipeline have been upgraded in the last few years. Some have completely revamped algorithms, others mostly added new parameters. If you have the original prior workflow or job, when you try to execute it or "re-run", Galaxy will provide a warning for each tool that has a upgraded version available (some older versions may still be present, or you may be required to upgrade - if you wish to work on the Main public instance at

We do not keep all versions indefinitely on Main. But, the Tool Shed is a great archive of prior tools/versions These would be for use in a production local/cloud Galaxy.

To exactly duplicate a run, based on nearly every factor, use the same exact tool versions, wrappers, and input data (including reference data). This can be done quite effectively on a CloudMan Galaxy with minimal set-up. With this option, a saved copy of the analysis could also be retained now going forward, for exact reproducibility later on, that you have full control over. Please note that Amazon has a grant program to help with costs.

Hope this helps! Jen, Galaxy team

ADD COMMENTlink written 4.1 years ago by Jennifer Hillman Jackson25k

Dear Jen,

Thank you for your reply. I have done (as you recommended) a re-run of the analysis - I found the same results as my newest run -but i am still unsure about the parameters that you are referring to.

Is it possible to know what these upgrades where? Is it the number of base pairs that are detected/ is it something else? How stringent are the detection parameters? What do you consider as a hit? etc

What are the newer algorithms used to detect significance and why is a gene for example that looks like its up-regulated 800 fold shown as non-significant?

I believe that its absolutely imperative to understand what these parameters are so that I can better interpret my data.

I am looking forward to your reply,

Kind Regards, Mariel

Mariel Voutounou, Ph.D. Burke Medical Research Institute, Department of Neurology & Neuroscience, Weill Medical College of Cornell University, 785 Mamaroneck Ave, White Plains, NY10605 Tel:(914) 426-3682

ADD REPLYlink written 4.0 years ago by mariel.voutounou0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 171 users visited in the last hour