Dear Galaxy Community,
I have been trying to configure my job_conf.xml for a cluster usage. We are working with Son of the GRID ENGINE (SGE) v8.1.9, which has a default installation/configuration.
I have no problem submitting jobs from galaxy to SGE. However, the ${GALAXY_SLOTS} variable from various tools is never taken into account. The assigned value of this parameters is always 1 (no matter what was specified in the tool wrapper or my job_conf.xml), and the number of dedicated slots in SGE also ends up being always 1. I have spent a lot of time trying to figure out where this problems come from, but I was unsuccessful in my quest.
You can take a look at my (for now) basic job_conf below. I mostly followed the instructions I found here http://galacticengineer.blogspot.de/2015/04/using-galaxyslots-for-multithreaded_22.html and they seemed to be consistent with other posts or websites I found about this topic.
<?xml version="1.0"?>
<!-- A sample job config that explicitly configures job running the way it is configured by default (if there is no explicit config). -->
<job_conf>
<plugins>
<plugin id="local" type="runner" load="galaxy.jobs.runners.local:LocalJobRunner" workers="4"/>
<plugin id="drmaa_default" type="runner" load="galaxy.jobs.runners.drmaa:DRMAAJobRunner" workers="10"/>
<!-- Override the $DRMAA_LIBRARY_PATH environment variable -->
<param id="drmaa_library_path">/opt/sge/lib/libdrmaa.so</param>
</plugins>
<handlers default="handlers">
<handler id="handler0" tags="handlers"/>
<handler id="handler1" tags="handlers"/>
<handler id="handler2" tags="handlers"/>
<handler id="handler3" tags="handlers"/>
</handlers>
<destinations default="sge_default">
<destination id="sge_default" runner="drmaa_default"/>
<param id="nativeSpecification">-R y -V -j n -pe smp 4</param>
<destination id="local" runner="local"/>
<param id="local_slots">4</param>
</destinations>
</job_conf>
I have verified, and I have indeed a parallel environment (-pe) called smp (with 999 slots). I don't know if it may be of any use for debugging, but a characteristics of this parallel environment is that the allocation rule is $pe_slots. Something else that I noticed is when I do qstat -j {jobnumber}, is that the parallel environment is not present in the job description when launched from galaxy.
I have the same issue when I use the local runner with local_slots 4 (after restarting Galaxy, of course), i.e. my GALAXY_SLOTS are set to 1 instead of 4 or the default value in the tool xml wrapper.
One final remark. When I take the command line from galaxy for my job and launch it manually from a terminal with the same cluster submission parameter (as such: qsub -N test_pe_smp -R y -V -j n -pe smp 4 my_galaxy_job.txt) , it works perfectly (and the parallel environment is present in the job description if I call qstat -j {jobnumber}.
I am sure I am doing something wrong, somewhere, but I haven't mange it figure it out yet. By the way, is there a way to see what is the command line sent from galaxy to SGE?
Thank you very much in advance!
David