Question: Issue with ${GALAXY_SLOTS} and SGE cluster
8 months ago by
david.roquis20 wrote:

Dear Galaxy Community,

I have been trying to configure my job_conf.xml for a cluster usage. We are working with Son of the GRID ENGINE (SGE) v8.1.9, which has a default installation/configuration.

I have no problem submitting jobs from galaxy to SGE. However, the ${GALAXY_SLOTS} variable from various tools is never taken into account. The assigned value of this parameters is always 1 (no matter what was specified in the tool wrapper or my job_conf.xml), and the number of dedicated slots in SGE also ends up being always 1. I have spent a lot of time trying to figure out where this problems come from, but I was unsuccessful in my quest.

You can take a look at my (for now) basic job_conf below. I mostly followed the instructions I found here and they seemed to be consistent with other posts or websites I found about this topic.

 <?xml version="1.0"?>
<!-- A sample job config that explicitly configures job running the way it is configured by default (if there is no explicit config). -->
        <plugin id="local" type="runner" load="" workers="4"/>  
        <plugin id="drmaa_default" type="runner" load="" workers="10"/>
        <!-- Override the $DRMAA_LIBRARY_PATH environment variable -->
        <param id="drmaa_library_path">/opt/sge/lib/</param>   
    <handlers default="handlers">
        <handler id="handler0" tags="handlers"/>
        <handler id="handler1" tags="handlers"/>    
        <handler id="handler2" tags="handlers"/>    
        <handler id="handler3" tags="handlers"/>    
    <destinations default="sge_default">
        <destination id="sge_default" runner="drmaa_default"/>
       <param id="nativeSpecification">-R y -V -j n -pe smp 4</param>
    <destination id="local" runner="local"/>
       <param id="local_slots">4</param> 

I have verified, and I have indeed a parallel environment (-pe) called smp (with 999 slots). I don't know if it may be of any use for debugging, but a characteristics of this parallel environment is that the allocation rule is $pe_slots. Something else that I noticed is when I do qstat -j {jobnumber}, is that the parallel environment is not present in the job description when launched from galaxy.

I have the same issue when I use the local runner with local_slots 4 (after restarting Galaxy, of course), i.e. my GALAXY_SLOTS are set to 1 instead of 4 or the default value in the tool xml wrapper.

One final remark. When I take the command line from galaxy for my job and launch it manually from a terminal with the same cluster submission parameter (as such: qsub -N test_pe_smp -R y -V -j n -pe smp 4 my_galaxy_job.txt) , it works perfectly (and the parallel environment is present in the job description if I call qstat -j {jobnumber}.

I am sure I am doing something wrong, somewhere, but I haven't mange it figure it out yet. By the way, is there a way to see what is the command line sent from galaxy to SGE?

Thank you very much in advance!


modified 8 months ago by Devon Ryan1.9k • written 8 months ago by david.roquis20
8 months ago by
Devon Ryan1.9k
Devon Ryan1.9k wrote:
<destinations default="sge_default">
    <destination id="sge_default" runner="drmaa_default">
        <param id="nativeSpecification">-R y -V -j n -pe smp 4</param>
        <param id="local_slots">4</param>
    <destination id="local" runner="local">
        <param id="local_slots">4</param> 

You not might need the local_slots param for the sge_default id, but you do need to correct your xml regardless.

modified 8 months ago • written 8 months ago by Devon Ryan1.9k

It works! Thanks a lot for your very precise,quick and efficient answer! I need to be more careful when working with XML code in the future.

written 8 months ago by david.roquis20
