variable slot configuration for GALAXY

Question: variable slot configuration for GALAXY_SLOTS

3.6 years ago by

wrf • 0

Germany

wrf • 0 wrote:

I am trying to configure the job_conf for our system.

Our cluster is currently set up with SLURM, though this is not integrated with galaxy yet, so galaxy is just running on 16 cpus on one of the nodes.

I would like to have all single processes run on the 16 cpus of the galaxy node and all other processes (blast, tophat, etc) sent to the other nodes via SLURM.

Is there some way to pass a variable amount of threads to a process based on the resources available? like if there are 15 cpus available, take that rather than wait a week for that extra 16th. or say use at least 4 and another 12 if available?

thanks!

multiprocess • 999 views

ADD COMMENT • link •

modified 3.6 years ago by jmchilton ♦ 1.1k • written 3.6 years ago by wrf • 0

3.6 years ago by

jmchilton ♦ 1.1k

United States

jmchilton ♦ 1.1k wrote:

There is not a switch to flip to enable something like this. This sounds like fairly sophisticated job scheduling, Galaxy doesn't really do any job scheduling let alone anything so dynamic. Normally for sophisticated behavior I recommend using a DRM (such as SLURM) even if it is a separate SLURM server managing just that one node. However, looking through SLURM's docs though it doesn't seem like there is a way to request ranges of core counts this way even using SLURM. I think Condor might be able to do this (http://www.mi.infn.it/condor/manual/2_6Submitting_Job.html) and if you could make the requests this way I think GALAXY_SLOTS would be set appropriately at runtime. If you are so motivated as to set up Condor on that one node we could work through the details - but I am not sure it is worth the effort?

Galaxy also supports dynamic job destinations - so one could potentially write Python code that would say poll a SLURM server, discover how many active jobs there are, and modify requests based on that. I am worried that code would be subject to race conditions and other buggy-ness. Potentially instead of polling SLURM you could implement this by looking at Galaxy's job state tables - there is a some helper code available to assist with this but I worry it would have the same problems with race conditions and the such (https://github.com/galaxyproject/galaxy/blob/dev/lib/galaxy/jobs/rule_helper.py).

Hope this helps and sorry I don't have a clearer, easier path to enable this behavior.

ADD COMMENT • link written 3.6 years ago by jmchilton ♦ 1.1k

Similar posts • Search »