Question: variable slot configuration for GALAXY_SLOTS
0
gravatar for wrf
3.6 years ago by
wrf0
Germany
wrf0 wrote:

I am trying to configure the job_conf for our system.

Our cluster is currently set up with SLURM, though this is not integrated with galaxy yet, so galaxy is just running on 16 cpus on one of the nodes.

I would like to have all single processes run on the 16 cpus of the galaxy node and all other processes (blast, tophat, etc) sent to the other nodes via SLURM.

Is there some way to pass a variable amount of threads to a process based on the resources available? like if there are 15 cpus available, take that rather than wait a week for that extra 16th. or say use at least 4 and another 12 if available?

thanks!

multiprocess • 999 views
ADD COMMENTlink modified 3.6 years ago by jmchilton1.1k • written 3.6 years ago by wrf0
0
gravatar for jmchilton
3.6 years ago by
jmchilton1.1k
United States
jmchilton1.1k wrote:

There is not a switch to flip to enable something like this. This sounds like fairly sophisticated job scheduling, Galaxy doesn't really do any job scheduling let alone anything so dynamic. Normally for sophisticated behavior I recommend using a DRM (such as SLURM) even if it is a separate SLURM server managing just that one node. However, looking through SLURM's docs though it doesn't seem like there is  a way to request ranges of core counts this way even using SLURM. I think Condor might be able to do this (http://www.mi.infn.it/condor/manual/2_6Submitting_Job.html) and if you could make the requests this way I think GALAXY_SLOTS would be set appropriately at runtime. If you are so motivated as to set up Condor on that one node we could work through the details - but I am not sure it is worth the effort?

Galaxy also supports dynamic job destinations - so one could potentially write Python code that would say poll a SLURM server, discover how many active jobs there are, and modify requests based on that. I am worried that code would be subject to race conditions and other buggy-ness. Potentially instead of polling SLURM you could implement this by looking at Galaxy's job state tables  - there is a some helper code available to assist with this but I worry it would have the same problems with race conditions and the such  (https://github.com/galaxyproject/galaxy/blob/dev/lib/galaxy/jobs/rule_helper.py).

Hope this helps and sorry I don't have a clearer, easier path to enable this behavior.

ADD COMMENTlink written 3.6 years ago by jmchilton1.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 182 users visited in the last hour