GALAXY_SLOTS and slashes

Question: GALAXY_SLOTS and slashes

17 months ago by

apozhitkov • 60 wrote:

Here is my experience with an exercise to make Galaxy utilize a multi-core / multi-processor environment of our server. Our setup is an Azure 16-cpu machine, which contains the latest release of Galaxy (17.05). My goal was to enable multithreading for the mapping tools such as RNA-STAR and TopHat (and any other ones that support multithreading).

I thought that must be easy, here is what the rg_rnaStar.xml tells us:

…
STAR
        --runThreadN \${GALAXY_SLOTS:-4}
…

Where is the GALAXY_SLOTS defined? The galaxyproject.org explains: “In all cases you need to include a parameter on your job_conf.xml that specifies the number of processes to be used for a given destination. If correctly defined your [sic] should see your GALAXY_SLOTS variable contain the specified value.”

Looking for job_conf.xml. Not there. Found job_conf.xml.sample_basic and job_conf.xml.sample_advanced
Somewhere it was written that job_conf.xml is not necessary. Without that file, Galaxy will spawn jobs assuming a uniprocessor machine.
The job_conf.xml.sample_advanced looks complicated; it must be the right one!
cp job_conf.xml.sample_advanced job_conf.xml
Restart Galaxy.
Tying to start RNA-STAR. Nothing’s happening.
rm job_conf.xml, restart Galaxy.
Starting RNA-STAR, working fine but clearly only 1 CPU is used.
Let’s try cp job_conf.xml.sample_basic job_conf.xml, restart Galaxy
Starting RNA-STAR, working fine but clearly only 1 CPU is used.
More googling, discover more folks are asking the same question, how to set up the GALAXY_SLOTS variable? Lots of comments on SLURM, DRM. Irrelevant for me.
Someone named “Galactic engineer” made a post “Using GALAXY_SLOTS with multithreaded Galaxy tools”, which refers to “Running Galaxy Tools on a Cluster” on galaxyproject.org
All right, it seems simple: add a line < param id="local_slots" > 16 < / param > to job_conf.xml
vi job_conf.xml, line aged, galaxy restarted.
RNA-Star is still on one CPU. ARGGHHHHH!
grep -r -i --include=*.sh 'GALAXY_SLOTS' ./
It must be this one: …./lib/galaxy/jobs/runners/util/job_script/CLUSTER_SLOTS_STATEMENT.sh
Edit, put 16 to explicitly set GALAXY_SLOTS, restart Galaxy
RNA-Star is still on one CPU. ARGGHHHHH!
Revert …STATEMENT.sh to what it was.
Found another “statement”: …./.venv/lib/python2.7/site-packages/pulsar/managers/util/job_script/CLUSTER_SLOTS_STATEMENT.sh
Edit, restart, RNA-STAR still on one CPU ARGGHHHHH!
Revert …STATEMENT.sh to what it was. Lunch break. Frustration
Another look at the job_conf.xml:

< destinations >

   < destination id="local" runner="local" / >

    < param id="local_slots" >16< / param >

< / destinations >

What is that slash doing at the end after “local”? Looks like the tag is being closed

OK, how about removing the slash and adding a closing tag:

< destinations >

   < destination id="local" runner="local" >

    < param id="local_slots" >16< / param >

   < / destination >

< / destinations >

Restart.
Run RNA-STAR. All processors are busy!
TopHat – all processors are busy.
Exhausted, but happy…

:-)

rna-seq multi-core multi-processor tophat • 603 views

ADD COMMENT • link •

written 17 months ago by apozhitkov • 60

Great post. I've had similar issues with SLURM on docker galaxy which are only partially resolved.

ADD REPLY • link written 17 months ago by colindaven • 0

Similar posts • Search »