Question: Setting up slurm and Galaxy
0
gravatar for jay
9 months ago by
jay20
jay20 wrote:

Hi,

Do anyone here know how to properly configure the job_conf.xml to point to a remote slurm server?

Thanks for the help

ADD COMMENTlink modified 8 weeks ago by sysadmin.caos0 • written 9 months ago by jay20
1
gravatar for Jennifer Hillman Jackson
9 months ago by
United States
Jennifer Hillman Jackson25k wrote:

Hello,

Please see:

Similiar posts are in the far right list or you can search prior Q&A by keywords for more.

Thanks! Jen, Galaxy team

ADD COMMENTlink modified 9 months ago • written 9 months ago by Jennifer Hillman Jackson25k
1
gravatar for jay
9 months ago by
jay20
jay20 wrote:

Thank you Jennifer for the response

I got pulsar working with the config below on job_conf.xml where the URL points to pulsar which is a different server.

<destination id="remote_cluster_1" runner="pulsar" tags="remote_cluster"> <param id="url">http://10.22.22.222:8913/</param> <param id="submit_native_specification">-P bignodes -R y -pe threads 16</param> <param id="dependency_resolution">remote</param> </destination>

I got slurm running when the head node is on the same server as galaxy. <destination id="slurm_galaxy_cpu12" runner="slurm"> <param id="nativeSpecification">--ntasks=12</param> </destination>

How to point to a remote slurm head node like what I did with pulsar by using the tag <param id="url">http://10.22.22.222:8913/</param>

ADD COMMENTlink modified 9 months ago • written 9 months ago by jay20

I don't personally know the correct syntax for this task (have a general idea of what to do, but it is not verified).

I've asked our dev team to help with this followup question. They will get back when they can (our team is very busy with the pending Galaxy release, sorry!) or you can review the administrator training to see if this topic is covered (might be faster): https://github.com/galaxyproject/dagobah-training/tree/master

Other ways to search prior Q&A or ask a detailed question to the wider admin/dev community: https://galaxyproject.org/mailing-lists/

  • The Search function at https://galaxyproject.org. Example using the keyword "admin" https://galaxyproject.org/search/?q=config#gsc.tab=0&gsc.q=config&gsc.page=1. The search is also available directly from the masthead on all pages.
  • The galaxyproejct public gitter channel - sometimes posting there brings in more community members who will have advice. Post the last part of the question (the important part), then link in this biostars post for context. https://gitter.im/galaxyproject/Lobby
  • The galaxy-dev@lists.galaxyproject.org mailing list is another choice and also reaches the wider Galaxy admin/development community. Including those that maintain larger instances with a complex configuration.

If you cross-post through another support venue we offer, always link in where you have already posted other places, again for context plus to help keep open/closed issues organized. This forum is good for all Galaxy questions yet is still primarily used for end-user questions. The gitter channel and galaxy-dev list are more direct routes to those working on server admin as well as development. And the search returns all results (all Galaxy resources in a single query) plus includes tabs that filter down results by area.

Please let us know if you solve this before we get back to you.

ADD REPLYlink modified 9 months ago • written 9 months ago by Jennifer Hillman Jackson25k
0
gravatar for sysadmin.caos
8 weeks ago by
sysadmin.caos0 wrote:

Hi,

I have compiled SLURM with DRMAA support. Configuring Galaxy with a "runner" as "slurm", I have got to execute a job in my SLURM partition but, after it finishes, Galaxy web continues showing my job in "pending" state, as if SLURM doesn't inform Galaxy that the job is finished.

Checking generated scripts in both configurations, I have notice one important diference. I attach logs file after execution:

  • With "local" runner:

galaxy.tools.execute [uWSGIWorker1Core2] Tool [ucsc_table_direct1] created job [39]

galaxy.jobs.mapper Mapped job to destination id: local

galaxy.jobs.handler Dispatching to local runner

galaxy.jobs Working directory for job is: /home/caos/galaxy/galaxy-dist/database jobs_directory/000/39

galaxy.jobs.command_factory [LocalRunner.work_thread-0] Built script [/home/caos/galaxy/galaxy-dist/database/jobs_directory/000/39/tool_script.sh] for tool command [python /home/caos/galaxy/galaxy-dist/tools/data_source/data_source.py /home/caos/galaxy/galaxy-dist/database/files/000/dataset_39.dat 0]

galaxy.jobs.runners [LocalRunner.work_thread-0] (39) command is: rm -rf working; mkdir -p working; cd working; /home/caos/galaxy/galaxy-dist/database/jobs_directory/000/39/tool_script.sh; return_code=$?; cd '/home/caos/galaxy/galaxy-dist/database/jobs_directory/000/39';

[ "$GALAXY_VIRTUAL_ENV" = "None" ] && GALAXY_VIRTUAL_ENV="$_GALAXY_VIRTUAL_ENV"; _galaxy_setup_environment True

python "/home/caos/galaxy/galaxy-dist/database/jobs_directory/000/39/set_metadata_z4SXNA.py" "/home/caos/galaxy/galaxy-dist/database/jobs_directory/000/39/registry.xml" "/home/caos/galaxy/galaxy-dist/database/jobs_directory/000/39/working/galaxy.json" "/home/caos/galaxy/galaxy-dist/database/jobs_directory/000/39 metadata_in_HistoryDatasetAssociation_39_3LiSXL,/home/caos/galaxy/galaxy-dist/database/jobs_directory/000/39/metadata_kwds_HistoryDatasetAssociation_39_UQkUtw,/home/caos/galaxy/galaxy-dist/database/jobs_directory/000/39/metadata_out_HistoryDatasetAssociation_39_9dvirJ,/home/caos/galaxy/galaxy-dist/database/jobs_directory/000/39/metadata_results_HistoryDatasetAssociation_39_xxwTZa,/home/caos/galaxy/galaxy-dist/database/files/000/dataset_39.dat,/home/caos/galaxy/galaxy-dist/database/jobs_directory/000/39/metadata_override_HistoryDatasetAssociation_39_6WkP25" 5242880; sh -c "exit $return_code"

galaxy.jobs.runners.local [LocalRunner.work_thread-0] (39) executing job script: /home/caos/galaxy/galaxy-dist/database/jobs_directory/000/39/galaxy_39.sh

galaxy.jobs.runners.local [LocalRunner.work_thread-0] execution finished: /home/caos/galaxy/galaxy-dist/database/jobs_directory/000/39/galaxy_39.sh

galaxy.model.metadata [LocalRunner.work_thread-0] loading metadata from file for: HistoryDatasetAssociation 39

galaxy.jobs [LocalRunner.work_thread-0] Collecting metrics for Job 39

galaxy.jobs [LocalRunner.work_thread-0] job 39 ended

  • With "slurm" runner:

galaxy.tools.execute [uWSGIWorker1Core1] Tool [ucsc_table_direct1] created job [40]

galaxy.jobs.mapper [JobHandlerQueue.monitor_thread] (40) Mapped job to destination id: slurm

galaxy.jobs.handler [JobHandlerQueue.monitor_thread] (40) Dispatching to slurm runner

galaxy.jobs [JobHandlerQueue.monitor_thread] (40) Working directory for job is: /home/caos/galaxy/galaxy-dist/database/jobs_directory/000/40

galaxy.tools.parameters.basic [SlurmRunner.work_thread-0] Url creation failed for "GALAXY_URL": 'thread._local' object has no attribute 'mapper'

galaxy.jobs.command_factory [SlurmRunner.work_thread-0] Built script [/home/caos/galaxy/galaxy-dist/database/jobs_directory/000/40/tool_script.sh] for tool command [python /home/caos/galaxy/galaxy-dist/tools/data_source/data_source.py /home/caos/galaxy/galaxy-dist/database/files/000/dataset_40.dat 0]

galaxy.jobs.runners command is: rm -rf working; mkdir -p working; cd working; /home/caos/galaxy/galaxy-dist/database/jobs_directory/000/40/tool_script.sh; return_code=$?; sh -c "exit $return_code"

galaxy.jobs.runners.drmaa submitting file /home/caos/galaxy/galaxy-dist/database/jobs_directory/000/40/galaxy_40.sh

galaxy.jobs.runners.drmaa native specification is: -p research.q -w my_execution_node -n 1

galaxy.jobs.runners.drmaa queued as 3749

galaxy.jobs Persisting job destination (destination id: slurm)

  • And from my slurmd.log file:

task_p_slurmd_batch_request: 3749

Launching batch job 3749 for UID 0

starting 1 tasks

task 0 (4096) exited with exit code 0.

job 3749 completed with slurm_rc = 0, job_rc = 0

sending REQUEST_COMPLETE_BATCH_SCRIPT, error:0 status 0

done with job

As you can see, there is an important diference in lines starting with "command is: rm -rf working" After that line, in "local" runner, it generates some files that in "slurm" runner doesn't generate... so I suppose problem starts here...

Could anobody help me?

Thanks!

ADD COMMENTlink written 8 weeks ago by sysadmin.caos0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 169 users visited in the last hour