Question: Galaxy unable to communicate with DRM: code 2: Requested session "XXXXXX" does not exist
0
BWimS • 0 wrote:
After upgrading Univa grid engine from 8.5.4 to version 8.6.2 galaxy seems unable to retrieve the status of jobs that it launched on the grid.
Jobs are being started on the cluster, I can see them run and finish on the cluster, but the job status in galaxy website UI never goes from 'waiting' (grey) , to running (yellow), to finished ('green').
galaxy.jobs.runners.drmaa WARNING 2018-10-02 10:24:56,780 (16943/23806467) unable to communicate with DRM: code 2: Requested session "25447" does not exist
galaxy.jobs.runners.drmaa WARNING 2018-10-02 10:24:57,823 (16942/23806466) unable to communicate with DRM: code 2: Requested session "25447" does not exist
galaxy.jobs.runners.drmaa WARNING 2018-10-02 10:24:57,846 (16943/23806467) unable to communicate with DRM: code 2: Requested session "25447" does not exist
galaxy.jobs.runners.drmaa WARNING 2018-10-02 10:24:58,870 (16942/23806466) unable to communicate with DRM: code 2: Requested session "25447" does not exist
galaxy.jobs.runners.drmaa WARNING 2018-10-02 10:24:58,890 (16943/23806467) unable to communicate with DRM: code 2: Requested session "25447" does not exist
galaxy.jobs.runners.drmaa WARNING 2018-10-02 10:24:59,916 (16942/23806466) unable to communicate with DRM: code 2: Requested session "25447" does not exist
galaxy.jobs.runners.drmaa WARNING 2018-10-02 10:24:59,948 (16943/23806467) unable to communicate with DRM: code 2: Requested session "25447" does not exist
galaxy.jobs.runners.drmaa WARNING 2018-10-02 10:25:00,976 (16942/23806466) unable to communicate with DRM: code 2: Requested session "25447" does not exist
galaxy.jobs.runners.drmaa WARNING 2018-10-02 10:25:00,997 (16943/23806467) unable to communicate with DRM: code 2: Requested session "25447" does not
This error is thrown by this galaxy code.
https://docs.galaxyproject.org/en/release_18.05/_modules/galaxy/jobs/runners/drmaa.html
There is this line
log.warning("(%s/%s) unable to communicate with DRM: %s", galaxy_id_tag, external_job_id, e)
The errorClass that is thrown does not hold more details.
http://gridscheduler.sourceforge.net/javadocs/org/ggf/drmaa/DrmCommunicationException.html
Does anyone have an idea on how to fix or troubleshoot this issue? Thank you.