Question: DeepTools plot correlation image error
gravatar for dorota.komar
18 months ago by
dorota.komar10 wrote:

Hi guys,

I am trying to make some nice images out of my ChIP-seq data. I want to compare regions bound by my TF with known histone marks occupancy sites. To that I wanted to use deeptools tool PlotCorrelation. I have a following error coming up. Any ideas about how to deal with it?

Thank you a lt in advance!

Fatal error: Exit code 1 () Warning. 24397 NaN values were found. They will be removed along with the corresponding bins in other samples for the computation and plotting Traceback (most recent call last): File "/galaxy/main/deps/_conda/envs/mulled-v1-0fa290085c742a3ffee6a142d1fce47c178e1c289ff6f0c66023b506f3377842/bin/plotCorrelation", line 11, in <module> main(args) File "/galaxy/main/deps/_conda/envs/mulled-v1-0fa290085c742a3ffee6a142d1fce47c178e1c289ff6f0c66023b506f3377842/lib/python2.7/site-packages/deeptools/", line 209, in main plot_numbers=args.plotNumbers) File "/galaxy/main/deps/_conda/envs/mulled-v1-0fa290085c742a3ffee6a142d1fce47c178e1c289ff6f0c66023b506f3377842/lib/python2.7/site-packages/deeptools/", line 256, in plot_correlation y_var = sch.linkage(corr_matrix, method='complete') File "/galaxy/main/deps/_conda/envs/mulled-v1-0fa290085c742a3ffee6a142d1fce47c178e1c289ff6f0c66023b506f3377842/lib/python2.7/site-packages/scipy/cluster/", line 676, in linkage raise ValueError("The condensed distance matrix must contain only finite values.") ValueError: The condensed distance matrix must contain only finite values.

ADD COMMENTlink modified 18 months ago by Devon Ryan1.9k • written 18 months ago by dorota.komar10

Can you share a history containing this and the input matrix with me (my account on is I can then more easily have a look at this.

ADD REPLYlink written 18 months ago by Devon Ryan1.9k

BTW, I suspect that the input matrix had 24397 (or so) entries and that each of these contained an NaN in it. That's the most likely cause, but then that begs the question regarding how THAT happened.

ADD REPLYlink written 18 months ago by Devon Ryan1.9k

Davon, thank you so so much! That is amazing. I have just shared the history with you. I was thinking that maybe this has happened because I don't have enough overlapp? I was also trying to perform plotHeatmapand even though the tools worked, the image I obtained doesn't really look ok. I have my ChIP-seq in two different tissue types. To increase the dataset to plot I wanted to use the regions boun in both tissue, but I get an info that some regionsoverlapp with each other and Ineed to remove overlapping regions. Could you maybe also sugest me how can I do that?

Thank you once again for trying to help me. Dorota

ADD REPLYlink written 18 months ago by dorota.komar10

oh, and the history I sent from account

ADD REPLYlink written 18 months ago by dorota.komar10
gravatar for Devon Ryan
18 months ago by
Devon Ryan1.9k
Devon Ryan1.9k wrote:

As suspected, each bin has at least 1 NaN in it, so there's no way to plot a very reliable correlation.

The root of this problem is how you used multiBigwigSummary. Your input only BED files that contained peaks, when what you really should have input are bigWig files made with bamCoverage directly from the BAM files (alternatively, give the original BAM files to multiBamSummary). That will give you a better idea about how well your samples actually correlate. If you want to restrict this to regions called as peaks in one or more samples then use the "BED file" mode under "Choose computation mode".

The same thing will apply to making the heatmap. Use your original BAM files with bamCoverage to make bigWig files. Those can then be used to make more meaningful heatmaps.

As an aside, I'll be doing a training on deepTools and MACS2 in the context of ChIPseq at the Galaxy community conference in France at the end of June. Feel free to attend, there are some great training sessions.

ADD COMMENTlink written 18 months ago by Devon Ryan1.9k

Thank you a lot for your answer. I tried to use BEDfile under choose computtaion mode and I selected all the files used for correlation as all of them were bed files. I was comparing 5 different files and I got 5 different multiBigWig summaries. Some of them did not work (I assume there was upper/lower case for mitochondrial and chloroplast chromosome names), so just to check it out I tried it with the two outputs that worked. However, the plotCorrelation tool failed once again and the same message appears. I do not have BAM files for the histone marks occupancy since I am using publicly available data, where only a final peaks are being shared.

PS I would love to attend the training, however it won't be possible due to the money restircions. But have you, r the organizers considered podcasting it? Even by using "pay to be able to see it" option? I am sure a lot of people whose last contact with programming was whn they had to set their sunday loundry and are panicaly affraid of writing comments in no interface programs (called in jargon "galaxy users") would love to have an opportunity to join it :)

ADD REPLYlink written 18 months ago by dorota.komar10

Perhaps they'll record them. I know that was attempted last year and there were "issues" of some sort with it. We have the training material from last years session online under "shared data" -> "data libraries" and "pages"), so perhaps have a quick browse through that (I'm not sure how useful that'd be without someone actually explaining the slides, but it wouldn't hurt).

Often public datasets will provide wiggle or bedGraph files, perhaps you can get them. Otherwise, there's really not meaningful what to produce the plots you want.

ADD REPLYlink written 18 months ago by Devon Ryan1.9k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 171 users visited in the last hour