Question: DiffBind: Generating a PCA in Galaxy
gravatar for JennyP
9 days ago by
JennyP10 wrote:

Hi all, I'm trying to use DiffBind in Galaxy to make a PCA or multiple ChIP-seq sets. I have the .bam files and .narrowPeak files for four groups of samples, each with three replicates. I have tried to use the latest version of Diffbind available (Galaxy version, as well as a previous version (

When using the latest version, I can't figure out how to rearrange my files to input them in the correct order since it states that the input order for the .bam files "MUST match the input order of the peaks files." The input boxes automatically populate and I can't seem to rearrange the list of files into the correct order to match the peak files. Is there a way around this? Also, when I have selected the first two groups of samples and wish to add another two groups, the "Insert Group" button is unclickable and doesn't work, so I can't add more groups/files.

When I execute this analysis using only two groups, I get the beginnings of the PCA I want. So I thought I would try the earlier version to see if that would also work but allow me to add all four groups. It does indeed allow me to add all the groups (and in the correct order), but even when I check yes to "visualizing the analysis results" the only visual output is a heat map, and there's no PCA, which is what I really want.

If anyone has any tips on getting either of these versions of Galaxy to work and give me a PCA for my four sets of samples that would be very much appreciated!! Thank you so much, and sorry if I'm just missing something dumb!

ADD COMMENTlink modified 7 days ago by Jennifer Hillman Jackson25k • written 9 days ago by JennyP10
gravatar for Jennifer Hillman Jackson
7 days ago by
United States
Jennifer Hillman Jackson25k wrote:


Diffbind is designed to work with two primary groups. The newer, updated tool form was redesigned to make this clearer and avoid usage errors, plus includes some bug fixes. The older form is organized by samples; the newer form is organized by groups.

The older tool version should be avoided -- use the latest: DiffBind differential binding analysis of ChIP-Seq peak data (Galaxy Version

If the sample datasets inputs (peak, bam, optional control bam) are each not in the proper order in your history, reorganize the data by clustering them into dataset collections. There can be up to 6 collections of input per Diffbind run: the first group would have 2 or 3 collections and the second group would have 2 or 3 collections. The two groups should have the same number of samples. The sample data should be placed in the collections in the same order for each of the 2 or 3 potential inputs. This means that if you include control bams with one of the groups you'll also need to include control bams with the other, although it preferred to use controls at the upstream peak calling step and instead of this step in an analysis workflow.

You can use dataset collections at the start of an analysis and continue to use them throughout the workflow, or place data into collections at later steps as needed/wanted.

For details and discussion about experimental design, please see the Diffbind help, user guide, and support channel: tool form plus

For how to build/use/manipulate dataset collections, please see this Galaxy tutorial: > Dataset collections - modern studies usually include many samples. Collection are designed to simplify complex, multi-sample analyses as shown in this tutorial.

Thanks! Jen, Galaxy team

ADD COMMENTlink written 7 days ago by Jennifer Hillman Jackson25k

Thank you so much!! I will do that. So there is no way to generate a PCA in Galaxy's Diffbind for three or four groups of samples? I'm not even super concerned with the differential binding itself, just more interested in ensuring that the samples of each group are clustering together rather than with other groups. Is there a recommended method for this in Galaxy?

ADD REPLYlink written 7 days ago by JennyP10

Try the analysis package "DeepTools". PCA plots can be generated comparing multiple samples/groups/conditions. The tools are well documented on the tool forms yet for a complete overview, tool-specific support from the authors and help for using these tools in Galaxy, it is definitely worth reviewing the docs here:

I would also suggest reviewing the extended tool options and example workflows that domain-specific Galaxy servers offer. You can always set up your own Galaxy (cloud, docker) but these will give you an overview of analysis/tool options and how others are using them to answer specific scientific questions.

ADD REPLYlink written 7 days ago by Jennifer Hillman Jackson25k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 178 users visited in the last hour