How to make multiple Tophats inside a workflow to run sequentially rather than simultaneously?

Question: How to make multiple Tophats inside a workflow to run sequentially rather than simultaneously?

2.3 years ago by

tungl • 0

U.S.A.

tungl • 0 wrote:

In my workflow on Galaxy, I have 6 fastq samples to run through Tophats and their output bam files are feed into Cuffdiff. So inside my workflow, I placed 6 Tophats there.

However, when I run the workflow, although the 6 Tophat runs appear to be sequential steps on the run form, they are actually executed simultaneously. Since our local Galaxy has limited memory, these simultaneous Tophat runs result in insufficient memory error.

I am wondering if there is a way to make the 6 Tophats to truly run sequentially?

Ideally, is there a way to just let two Tophats run simultaneously, and after they complete, move to another two Tophats? Our memory is just big enough to run two Tophats at the same time.

I’d appreciate your advice and suggestions.

Thank you very much in advance!

rna-seq galaxy • 859 views

ADD COMMENT • link •

written 2.3 years ago by tungl • 0

2.3 years ago by

Devon Ryan • 1.9k

Germany

Devon Ryan • 1.9k wrote:

Have you tried making memory a consumable resource and indicating that in job_conf.xml? That'd be preferable to modifying the workflow, I'd think.

ADD COMMENT • link written 2.3 years ago by Devon Ryan • 1.9k

Thanks for your suggestion!

What does job_conf.xml file do? Is this file managed by Galaxy administrator?

I'm just wondering if there is a way from user's side that we can easily specify a Tophat to run after another. I thought the run form should do this arrangement of steps, but it doesn't seem to do so.

Just like we run command-line Tophat in Unix, we can make them run sequentially or simultaneously as we want.

ADD REPLY • link written 2.3 years ago by tungl • 0

Correct, it's on the administrator side. I'm not sure there's a way to do this on the user side, since Galaxy should try to run as many independent jobs as it can in parallel.

ADD REPLY • link written 2.3 years ago by Devon Ryan • 1.9k

Thanks!

So how does Galaxy determine how many Tophats could run in paralell?

Could you please give me a little bit details about what this job_conf.xml file specifies? So I can talk to our local Galaxy administrator about this.

Thanks a lot!

ADD REPLY • link written 2.3 years ago by tungl • 0

I've not needed to tweak the memory settings, but the general idea is to have Galaxy use a scheduler (we use slurm) and then specify that. For our slurm-based cluster, we'd use something like:

<destination id="slurm4threads10gigs" runner="slurm">
        <param id="embed_metadata_in_job">False</param>
        <param id="nativeSpecification">-p work -n 4 --mem 10000</param>
</destination>

There might be a way to do that with the local runner, but it's probably a bit simpler with a standard job scheduler since those are written specifically for this purpose.

ADD REPLY • link written 2.3 years ago by Devon Ryan • 1.9k

Thanks a lot for the information!

ADD REPLY • link written 2.3 years ago by tungl • 0

Similar posts • Search »