Hi,
Do anyone here know how to properly configure the job_conf.xml to point to a remote slurm server?
Thanks for the help
Hi,
Do anyone here know how to properly configure the job_conf.xml to point to a remote slurm server?
Thanks for the help
Hello,
Please see:
Similiar posts are in the far right list or you can search prior Q&A by keywords for more.
Thanks! Jen, Galaxy team
Thank you Jennifer for the response
I got pulsar working with the config below on job_conf.xml where the URL points to pulsar which is a different server.
<destination id="remote_cluster_1" runner="pulsar" tags="remote_cluster"> <param id="url">http://10.22.22.222:8913/</param> <param id="submit_native_specification">-P bignodes -R y -pe threads 16</param> <param id="dependency_resolution">remote</param> </destination>
I got slurm running when the head node is on the same server as galaxy. <destination id="slurm_galaxy_cpu12" runner="slurm"> <param id="nativeSpecification">--ntasks=12</param> </destination>
How to point to a remote slurm head node like what I did with pulsar by using the tag <param id="url">http://10.22.22.222:8913/</param>
I don't personally know the correct syntax for this task (have a general idea of what to do, but it is not verified).
I've asked our dev team to help with this followup question. They will get back when they can (our team is very busy with the pending Galaxy release, sorry!) or you can review the administrator training to see if this topic is covered (might be faster): https://github.com/galaxyproject/dagobah-training/tree/master
Other ways to search prior Q&A or ask a detailed question to the wider admin/dev community: https://galaxyproject.org/mailing-lists/
If you cross-post through another support venue we offer, always link in where you have already posted other places, again for context plus to help keep open/closed issues organized. This forum is good for all Galaxy questions yet is still primarily used for end-user questions. The gitter channel and galaxy-dev list are more direct routes to those working on server admin as well as development. And the search returns all results (all Galaxy resources in a single query) plus includes tabs that filter down results by area.
Please let us know if you solve this before we get back to you.
Hi,
I have compiled SLURM with DRMAA support. Configuring Galaxy with a "runner" as "slurm", I have got to execute a job in my SLURM partition but, after it finishes, Galaxy web continues showing my job in "pending" state, as if SLURM doesn't inform Galaxy that the job is finished.
Checking generated scripts in both configurations, I have notice one important diference. I attach logs file after execution:
galaxy.tools.execute [uWSGIWorker1Core2] Tool [ucsc_table_direct1] created job [39]
galaxy.jobs.mapper Mapped job to destination id: local
galaxy.jobs.handler Dispatching to local runner
galaxy.jobs Working directory for job is: /home/caos/galaxy/galaxy-dist/database jobs_directory/000/39
galaxy.jobs.command_factory [LocalRunner.work_thread-0] Built script [/home/caos/galaxy/galaxy-dist/database/jobs_directory/000/39/tool_script.sh] for tool command [python /home/caos/galaxy/galaxy-dist/tools/data_source/data_source.py /home/caos/galaxy/galaxy-dist/database/files/000/dataset_39.dat 0]
galaxy.jobs.runners [LocalRunner.work_thread-0] (39) command is: rm -rf working; mkdir -p working; cd working; /home/caos/galaxy/galaxy-dist/database/jobs_directory/000/39/tool_script.sh; return_code=$?; cd '/home/caos/galaxy/galaxy-dist/database/jobs_directory/000/39';
[ "$GALAXY_VIRTUAL_ENV" = "None" ] && GALAXY_VIRTUAL_ENV="$_GALAXY_VIRTUAL_ENV"; _galaxy_setup_environment True
python "/home/caos/galaxy/galaxy-dist/database/jobs_directory/000/39/set_metadata_z4SXNA.py" "/home/caos/galaxy/galaxy-dist/database/jobs_directory/000/39/registry.xml" "/home/caos/galaxy/galaxy-dist/database/jobs_directory/000/39/working/galaxy.json" "/home/caos/galaxy/galaxy-dist/database/jobs_directory/000/39 metadata_in_HistoryDatasetAssociation_39_3LiSXL,/home/caos/galaxy/galaxy-dist/database/jobs_directory/000/39/metadata_kwds_HistoryDatasetAssociation_39_UQkUtw,/home/caos/galaxy/galaxy-dist/database/jobs_directory/000/39/metadata_out_HistoryDatasetAssociation_39_9dvirJ,/home/caos/galaxy/galaxy-dist/database/jobs_directory/000/39/metadata_results_HistoryDatasetAssociation_39_xxwTZa,/home/caos/galaxy/galaxy-dist/database/files/000/dataset_39.dat,/home/caos/galaxy/galaxy-dist/database/jobs_directory/000/39/metadata_override_HistoryDatasetAssociation_39_6WkP25" 5242880; sh -c "exit $return_code"
galaxy.jobs.runners.local [LocalRunner.work_thread-0] (39) executing job script: /home/caos/galaxy/galaxy-dist/database/jobs_directory/000/39/galaxy_39.sh
galaxy.jobs.runners.local [LocalRunner.work_thread-0] execution finished: /home/caos/galaxy/galaxy-dist/database/jobs_directory/000/39/galaxy_39.sh
galaxy.model.metadata [LocalRunner.work_thread-0] loading metadata from file for: HistoryDatasetAssociation 39
galaxy.jobs [LocalRunner.work_thread-0] Collecting metrics for Job 39
galaxy.jobs [LocalRunner.work_thread-0] job 39 ended
galaxy.tools.execute [uWSGIWorker1Core1] Tool [ucsc_table_direct1] created job [40]
galaxy.jobs.mapper [JobHandlerQueue.monitor_thread] (40) Mapped job to destination id: slurm
galaxy.jobs.handler [JobHandlerQueue.monitor_thread] (40) Dispatching to slurm runner
galaxy.jobs [JobHandlerQueue.monitor_thread] (40) Working directory for job is: /home/caos/galaxy/galaxy-dist/database/jobs_directory/000/40
galaxy.tools.parameters.basic [SlurmRunner.work_thread-0] Url creation failed for "GALAXY_URL": 'thread._local' object has no attribute 'mapper'
galaxy.jobs.command_factory [SlurmRunner.work_thread-0] Built script [/home/caos/galaxy/galaxy-dist/database/jobs_directory/000/40/tool_script.sh] for tool command [python /home/caos/galaxy/galaxy-dist/tools/data_source/data_source.py /home/caos/galaxy/galaxy-dist/database/files/000/dataset_40.dat 0]
galaxy.jobs.runners command is: rm -rf working; mkdir -p working; cd working; /home/caos/galaxy/galaxy-dist/database/jobs_directory/000/40/tool_script.sh; return_code=$?; sh -c "exit $return_code"
galaxy.jobs.runners.drmaa submitting file /home/caos/galaxy/galaxy-dist/database/jobs_directory/000/40/galaxy_40.sh
galaxy.jobs.runners.drmaa native specification is: -p research.q -w my_execution_node -n 1
galaxy.jobs.runners.drmaa queued as 3749
galaxy.jobs Persisting job destination (destination id: slurm)
task_p_slurmd_batch_request: 3749
Launching batch job 3749 for UID 0
starting 1 tasks
task 0 (4096) exited with exit code 0.
job 3749 completed with slurm_rc = 0, job_rc = 0
sending REQUEST_COMPLETE_BATCH_SCRIPT, error:0 status 0
done with job
As you can see, there is an important diference in lines starting with "command is: rm -rf working" After that line, in "local" runner, it generates some files that in "slurm" runner doesn't generate... so I suppose problem starts here...
Could anobody help me?
Thanks!