Here is my experience with an exercise to make Galaxy utilize a multi-core / multi-processor environment of our server. Our setup is an Azure 16-cpu machine, which contains the latest release of Galaxy (17.05). My goal was to enable multithreading for the mapping tools such as RNA-STAR and TopHat (and any other ones that support multithreading).
I thought that must be easy, here is what the rg_rnaStar.xml tells us:
…
STAR
--runThreadN \${GALAXY_SLOTS:-4}
…
Where is the GALAXY_SLOTS defined? The galaxyproject.org explains: “In all cases you need to include a parameter on your job_conf.xml that specifies the number of processes to be used for a given destination. If correctly defined your [sic] should see your GALAXY_SLOTS variable contain the specified value.”
- Looking for job_conf.xml. Not there. Found job_conf.xml.sample_basic and job_conf.xml.sample_advanced
- Somewhere it was written that job_conf.xml is not necessary. Without that file, Galaxy will spawn jobs assuming a uniprocessor machine.
- The job_conf.xml.sample_advanced looks complicated; it must be the right one!
- cp job_conf.xml.sample_advanced job_conf.xml
- Restart Galaxy.
- Tying to start RNA-STAR. Nothing’s happening.
- rm job_conf.xml, restart Galaxy.
- Starting RNA-STAR, working fine but clearly only 1 CPU is used.
- Let’s try cp job_conf.xml.sample_basic job_conf.xml, restart Galaxy
- Starting RNA-STAR, working fine but clearly only 1 CPU is used.
- More googling, discover more folks are asking the same question, how to set up the GALAXY_SLOTS variable? Lots of comments on SLURM, DRM. Irrelevant for me.
- Someone named “Galactic engineer” made a post “Using GALAXY_SLOTS with multithreaded Galaxy tools”, which refers to “Running Galaxy Tools on a Cluster” on galaxyproject.org
- All right, it seems simple: add a line < param id="local_slots" > 16 < / param > to job_conf.xml
- vi job_conf.xml, line aged, galaxy restarted.
- RNA-Star is still on one CPU. ARGGHHHHH!
- grep -r -i --include=*.sh 'GALAXY_SLOTS' ./
- It must be this one: …./lib/galaxy/jobs/runners/util/job_script/CLUSTER_SLOTS_STATEMENT.sh
- Edit, put 16 to explicitly set GALAXY_SLOTS, restart Galaxy
- RNA-Star is still on one CPU. ARGGHHHHH!
- Revert …STATEMENT.sh to what it was.
- Found another “statement”: …./.venv/lib/python2.7/site-packages/pulsar/managers/util/job_script/CLUSTER_SLOTS_STATEMENT.sh
- Edit, restart, RNA-STAR still on one CPU ARGGHHHHH!
- Revert …STATEMENT.sh to what it was. Lunch break. Frustration
- Another look at the job_conf.xml:
< destinations >
< destination id="local" runner="local" / >
< param id="local_slots" >16< / param >
< / destinations >
- What is that slash doing at the end after “local”? Looks like the tag is being closed
OK, how about removing the slash and adding a closing tag:
< destinations >
< destination id="local" runner="local" > < param id="local_slots" >16< / param > < / destination >
< / destinations >
Restart.
- Run RNA-STAR. All processors are busy!
- TopHat – all processors are busy.
- Exhausted, but happy…
:-)
Great post. I've had similar issues with SLURM on docker galaxy which are only partially resolved.