Question: Slurm Drmaa configuration for Galaxy
0
pks71500 • 20 wrote:
Hello, I have followed the below links to configure Slurm for Galaxy.
https://biostar.usegalaxy.org/p/19543/
http://gmod.827538.n3.nabble.com/Running-Galaxy-on-a-cluster-with-SLURM-td4051302.html
I can successfully submit a job through slurm-drmaa or python-drmaa, but not from Galaxy. Galaxy only shows "This job is waiting to run."
When I ran run.sh I saw following log message.
galaxy.jobs.manager DEBUG 2018-09-19 12:50:33,518 [p:10604,w:0,m:0] [MainThread] Initializing job handler
galaxy.jobs INFO 2018-09-19 12:50:33,518 [p:10604,w:0,m:0] [MainThread] Handler 'main' will load specified runner plugins: slurm
galaxy.jobs.runners.state_handler_factory DEBUG 2018-09-19 12:50:33,520 [p:10604,w:0,m:0] [MainThread] Loaded 'failure' state handler from module galaxy.jobs.runners.state_handlers.resubmit
I #296c [ 0.00] * logging started at: 2018-09-19 12:50:33.52 Z
t #296c [ 0.00] -> fsd_exc_init
t #296c [ 0.00] <- fsd_exc_init
t #296c [ 0.00] -> drmaa_init(contact=(null))
t #296c [ 0.00] -> fsd_drmaa_session_new((null))
t #296c [ 0.00] -> fsd_job_set_new()
t #296c [ 0.00] <- fsd_job_set_new =0x4b87e30
t #296c [ 0.00] -> fsd_conf_read(filename=/etc/slurm_drmaa.conf, must_exist=false, content=(null))
t #296c [ 0.00] * content from file
t #296c [ 0.00] <- fsd_conf_read
t #296c [ 0.00] -> fsd_conf_read(filename=/root/.slurm_drmaa.conf, must_exist=false, content=(null))
t #296c [ 0.00] <- fsd_conf_read
t #296c [ 0.00] -> fsd_drmaa_session_apply_configuration
t #296c [ 0.00] <- fsd_drmaa_session_apply_configuration
t #296c [ 0.00] <- drmaa_init =0
When I launched a job I saw following log message.
galaxy.tools.actions.upload DEBUG 2018-09-19 13:51:06,388 [p:11274,w:1,m:0] [uWSGIWorker1Core1] Checked uploads (621.864 ms)
galaxy.tools.actions.upload_common INFO 2018-09-19 13:51:06,500 [p:11274,w:1,m:0] [uWSGIWorker1Core1] tool upload1 created job id 6
galaxy.tools.actions.upload DEBUG 2018-09-19 13:51:06,633 [p:11274,w:1,m:0] [uWSGIWorker1Core1] Created upload job (244.750 ms)
galaxy.tools.execute DEBUG 2018-09-19 13:51:06,633 [p:11274,w:1,m:0] [uWSGIWorker1Core1] Tool [upload1] created job [6] (867.546 ms)
galaxy.tools.execute DEBUG 2018-09-19 13:51:06,657 [p:11274,w:1,m:0] [uWSGIWorker1Core1] Executed 1 job(s) for tool upload1 request: (907.967 ms)
Here is my job_conf.xml file.
<?xml version="1.0"?>
<!-- A sample job config that explicitly configures job running the way it is configured by default (if there is no explicit config). -->
<job_conf>
<plugins workers="10">
<plugin id="slurm" type="runner" load="galaxy.jobs.runners.slurm:SlurmJobRunner"/>
<param id="drmaa_library_path">/usr/local/lib/libdrmaa.so</param>
</plugins>
<handlers default="handlers">
<handler id="main" tags="handlers">
<plugin id="slurm"/>
</handler>
</handlers>
<destinations default="slurm">
<destination id="slurm" runner="slurm">
<param id="request_cpus">1</param>
<param id="embed_metadata_in_job">False</param>
<param id="nativeSpecification">-p standard </param>
<env file="/srv/galaxy/.venv/bin/activate" />
</destination>
</destinations>
</job_conf>
Any help would be appreciated. Thanks.
I have updated job_conf.xml file and I can now see that Galaxy tries to submit a job. But, it fails with Invalid user id.
I use PAM authentication and the user can submit a job via terminal. Anyone help me to pass user id correctly?
Are you running the latest release 18.05? https://docs.galaxyproject.org/en/master/releases/18.05_announce.html
There was a fix for running jobs as the "real user" in a VM: https://github.com/galaxyproject/galaxy/pull/5881#issue-181255818
The most current doc is now published here: https://docs.galaxyproject.org/en/master/admin/cluster.html#submitting-jobs-as-the-real-user
We can follow up troubleshooting after you check that your config matches what is published. I am fairly certain that using a
yaml
config is needed (instead of the priorini
). We may ask you to share that and get the developers involved.Thanks, Jennifer.
I have followed the link and now Galaxy submits with a real user id, but drmaa fails with unknown error.
18.05 in galaxy.yml, I added the following lines. I have created folders for new_file_path and job_working_directory. I am not sure if there are any issues with lines.
I would appreciate any advice you may have.