Question: How to make multiple Tophats inside a workflow to run sequentially rather than simultaneously?
0
gravatar for tungl
2.3 years ago by
tungl0
U.S.A.
tungl0 wrote:

In my workflow on Galaxy, I have 6 fastq samples to run through Tophats and their output bam files are feed into Cuffdiff. So inside my workflow, I placed 6 Tophats there.

However, when I run the workflow, although the 6 Tophat runs appear to be sequential steps on the run form, they are actually executed simultaneously. Since our local Galaxy has limited memory, these simultaneous Tophat runs result in insufficient memory error.

I am wondering if there is a way to make the 6 Tophats to truly run sequentially?

Ideally, is there a way to just let two Tophats run simultaneously, and after they complete, move to another two Tophats? Our memory is just big enough to run two Tophats at the same time.

I’d appreciate your advice and suggestions.

Thank you very much in advance!

rna-seq galaxy • 859 views
ADD COMMENTlink written 2.3 years ago by tungl0
2
gravatar for Devon Ryan
2.3 years ago by
Devon Ryan1.9k
Germany
Devon Ryan1.9k wrote:

Have you tried making memory a consumable resource and indicating that in job_conf.xml? That'd be preferable to modifying the workflow, I'd think.

ADD COMMENTlink written 2.3 years ago by Devon Ryan1.9k

Thanks for your suggestion!

What does job_conf.xml file do? Is this file managed by Galaxy administrator?

I'm just wondering if there is a way from user's side that we can easily specify a Tophat to run after another. I thought the run form should do this arrangement of steps, but it doesn't seem to do so.

Just like we run command-line Tophat in Unix, we can make them run sequentially or simultaneously as we want.

ADD REPLYlink written 2.3 years ago by tungl0
1

Correct, it's on the administrator side. I'm not sure there's a way to do this on the user side, since Galaxy should try to run as many independent jobs as it can in parallel.

ADD REPLYlink written 2.3 years ago by Devon Ryan1.9k

Thanks!

So how does Galaxy determine how many Tophats could run in paralell?

Could you please give me a little bit details about what this job_conf.xml file specifies? So I can talk to our local Galaxy administrator about this.

Thanks a lot!

ADD REPLYlink written 2.3 years ago by tungl0
2

I've not needed to tweak the memory settings, but the general idea is to have Galaxy use a scheduler (we use slurm) and then specify that. For our slurm-based cluster, we'd use something like:

<destination id="slurm4threads10gigs" runner="slurm">
        <param id="embed_metadata_in_job">False</param>
        <param id="nativeSpecification">-p work -n 4 --mem 10000</param>
</destination>

There might be a way to do that with the local runner, but it's probably a bit simpler with a standard job scheduler since those are written specifically for this purpose.

ADD REPLYlink written 2.3 years ago by Devon Ryan1.9k

Thanks a lot for the information!

ADD REPLYlink written 2.3 years ago by tungl0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 173 users visited in the last hour