Question: -nan in Cuffdiff run
2.9 years ago by
Cory Dunn • 40
Cory Dunn • 40 wrote:
Dear Galaxy Experts:
Today while running Cuffdiff, some of the FPKM values were reported as "-nan". Strangely, running with exactly the same Cuffdiff parameters and exactly the same TopHat output files worked fine in an earlier instance. What could be the reason for this problem?
Thanks for any assistance you might provide.
ADD COMMENT • link •modified 2.9 years ago by Jennifer Hillman Jackson ♦ 25k • written 2.9 years ago by Cory Dunn • 40
I had this problem about a month ago. It is only a small percentage of FPKM values and always the treatment sample (value 2); not in the control (value 1) sample. I was not using Galaxy, but running Cuffdiff off of a server. I posted a question on the SeqAnswers website but never received and answer. There was another post from this March (http://seqanswers.com/forums/showthread.php?t=58121&highlight=-nan) on that website that had a similar problem which also has no answers.
I also had two analysis runs of the same data and -nan occurs in one but not in the other. After a quick look it seemed the only differences were the length of the sample names. The -nan occurring when longer sample names were used. However, I have no idea if this contributed to the problem.
If you figure out the problem, please post an answer. If I come up with an answer I will post here.
Thanks Matthew for sharing more details. I will test the name lengths to see if reproducible in Galaxy (although I agree, it is probably not the root cause). This may be an issue with the binary distribution compression, but that is still being investigated, as that particular problem (slightly different unexpected result yet involving the same calculation) was reportedly fixed in release 2.0.1 of Cuffdiff.
I found a few other reports of the problem as well. The issue does not seem to be linked to the tool's execution in Galaxy at this time.
If this is solved online elsewhere, please do let us know.
Thanks, Jen, Galaxy team
Update: Our team is reviewing these issues and will update this post when the root cause is determined.
Thanks to all who have submitted examples of the problem.
Jen, Galaxy team
Where are you using Galaxy? In a local? The new and old instances are based off a different Galaxy release version? Which ones? The same exact tool versions? If not, what were the original and new?
For initial testing, trying out the run at http://usegalaxy.org might give some insight into what the root issue is. Since the inputs are exactly the same, only one run should be enough. However, if there were any changes in any factors (tools, data), the run both to compare.
If you do that and the new result is the same on Main (but the second differs from your first result - also on Main if at all possible), you can send in an email to firstname.lastname@example.org with shared history link(s) (make certain all datasets are undeleted) along with a link to this biostars post.
If the job result is different on Main (the expected result matching your first local run), then make certain that your instance is up to date as first pass solution. Use the master branch (not dev).
To understand the different types of output Cuffdiff produces, review the tool's manual. There are too many options to list out here, many work together, and even if you shared them in text format, it would be near impossible to exactly replicate your job without shared histories with input datasets.
The overall idea is for the review to access exactly what was done to reproduce the original expected result and the problematic result, then work from there to troubleshoot. The final feedback could potentially link the different job results to a Galaxy release version, tool version, workflow change, input change, and the like, but we can see.
Let us know how it goes. We'll watch for an email to the bugs list in case you go that route. Jen, Galaxy team
I am also having a problem with "nan" values. I am running cuffdiff v 2.2.1. The cuffdiff parameters used included, geometric library normalization, pooled dispersion estimation, 0.05 FDR, Min alignment count 10, multi-read correct, bias correction and cufflinks effective length correction. When I looked at the genes read group tracking, I could only see the -nan values in one sample so I removed it from the analysis and reran it. The nan values reappeared in a different sample. The FPKM values seem fine in the cufflinks files for these individuals. There are also reasonable values listed for a gene in the "raw_frags", "internal_scaled_frags", "external_scaled_frags" columns of the genes read group tracking file but the "FPKM" file will have the -nan value and as I said before, this only appears in a single sample, but not the same sample for different runs.
I am running 65 samples on a local instance of Galaxy and just the bam files and gtf files are ~200 GB, so transferring and testing them on http://usegalaxy.org is not really an option.