Question: Error with Torque/drmaa
0
gravatar for tuto345
3.3 years ago by
tuto34520
France
tuto34520 wrote:

Dear all,

I followed the instructions at https://wiki.galaxyproject.org/Events/GCC2014/TrainingDay/AdminWalkthrough#Configure_Galaxy on a single machine with 40 cores and centos 7. I couldn't scramble the pbs_python egg (tried 4.3.5 and 4.4.0) so I installed the recomended pbs_drmaa (tried with 1.0.17 and 1.0.18) and indicate the path in the job_conf.xml file. While torque (4.2.9) is correctly installed and the galaxy user can submit jobs, when submitting through the galaxy interface the jobs end immediately with a "state change: job finished, but failed" message in galaxy logs and Exit_status=2  in the pbs_server log.

Any idea what might be wrong?

Cristian

Logs for galaxy handler and pbs_server (changed the server name!)

Galaxy log:

galaxy.jobs DEBUG 2015-01-08 14:36:08,489 (21050) Working directory for job is: /home/galaxy/galaxy-dist/database/job_working_directory/021/21050
galaxy.jobs.handler DEBUG 2015-01-08 14:36:08,516 (21050) Dispatching to torque runner
galaxy.jobs DEBUG 2015-01-08 14:36:08,651 (21050) Persisting job destination (destination id: torque)
galaxy.jobs.handler INFO 2015-01-08 14:36:08,671 (21050) Job dispatched
galaxy.tools.deps DEBUG 2015-01-08 14:36:09,013 Building dependency shell command for dependency 'cufflinks'
galaxy.jobs.runners DEBUG 2015-01-08 14:36:09,239 (21050) command is: PACKAGE_BASE=/home/galaxy/galaxy-dist/dependency_dir/cufflinks/2.1.1/devteam/cufflinks/9aab29e159a7; export PACKAGE_BASE; . /home/galaxy/galaxy-dist/dependency_dir/cufflinks/2.1.1/devteam/cufflinks/9aab29e159a7/env.sh; cufflinks 2>&1 | head -n 1 > /home/galaxy/galaxy-dist/database/tmp/GALAXY_VERSION_STRING_21050 2>&1; python /home/galaxy/shed_tools/toolshed.g2.bx.psu.edu/repos/devteam/cufflinks/9aab29e159a7/cufflinks/cufflinks_wrapper.py              --input=/home/galaxy/galaxy-dist/database/files/031/dataset_31505.dat             --assembled-isoforms-output=/home/galaxy/galaxy-dist/database/files/032/dataset_32805.dat             --num-threads="${GALAXY_SLOTS:-4}"             -I 300000             -F 0.1             -j 0.15; return_code=$?; if [ -f /home/galaxy/galaxy-dist/database/job_working_directory/021/21050/global_model.txt ] ; then cp /home/galaxy/galaxy-dist/database/job_working_directory/021/21050/global_model.txt /home/galaxy/galaxy-dist/database/files/032/dataset_32806.dat ; fi..... <snipped a lot of text/>
galaxy.jobs.runners.drmaa DEBUG 2015-01-08 14:36:09,270 (21050) submitting file /home/galaxy/galaxy-dist/database/job_working_directory/021/21050/galaxy_21050.sh
galaxy.jobs.runners.drmaa INFO 2015-01-08 14:36:09,276 (21050) queued as 51.myserver.local.net
galaxy.jobs DEBUG 2015-01-08 14:36:09,307 (21050) Persisting job destination (destination id: torque)
galaxy.jobs.runners.drmaa DEBUG 2015-01-08 14:36:10,285 (21050/51.myserver.local.net) state change: job finished, but failed
galaxy.datatypes.metadata DEBUG 2015-01-08 14:36:10,633 Cleaning up external metadata files
galaxy.datatypes.metadata DEBUG 2015-01-08 14:36:10,650 Failed to cleanup MetadataTempFile temp files from /home/galaxy/galaxy-dist/database/job_working_directory/021/21050/metadata_out_HistoryDatasetAssociation_39981_LGNwud: No JSON object could be decoded

pbs_server log:

01/08/2015 14:36:09;0100;PBS_Server.27598;Job;51.myserver.local.net;enqueuing into batch, state 1 hop 1
01/08/2015 14:36:09;0008;PBS_Server.27598;Job;req_commit;job_id: 51.myserver.local.net
01/08/2015 14:36:09;0008;PBS_Server.29816;Job;51.myserver.local.net;Job Modified at request of root@myserver.local.net
01/08/2015 14:36:09;0008;PBS_Server.29816;Job;51.myserver.local.net;Job Run at request of root@myserver.local.net
01/08/2015 14:36:09;000d;PBS_Server.29816;Job;51.myserver.local.net;Not sending email: User does not want mail of this type.
01/08/2015 14:36:09;0010;PBS_Server.27662;Job;51.myserver.local.net;Exit_status=2 resources_used.cput=00:00:00 resources_used.mem=0kb resources_used.vmem=0kb resou
rces_used.walltime=00:00:00
01/08/2015 14:36:09;000d;PBS_Server.27662;Job;51.myserver.local.net;Not sending email: User does not want mail of this type.
01/08/2015 14:36:09;0008;PBS_Server.27662;Job;51.myserver.local.net;on_job_exit valid pjob: 51.myserver.local.net (substate=50)
01/08/2015 14:36:44;0002;PBS_Server.28396;Svr;PBS_Server;Torque Server Version = 4.2.9, loglevel = 0
01/08/2015 14:37:17;0100;PBS_Server.28396;Job;51.myserver.local.net;dequeuing from batch, state COMPLETE
software error • 1.2k views
ADD COMMENTlink modified 3.3 years ago • written 3.3 years ago by tuto34520
0
gravatar for tuto345
3.3 years ago by
tuto34520
France
tuto34520 wrote:

Hi all,

Just for completion, The problem comes from uwsgi and/or most probably supervisord. When I start the uwsgi server through supervisord it does not launch the virtualenv, even when I explicitly set the appropriate command line in the supervisord configuration file. And of course the .bashrc contains the virtualenv activation command, but somehow it is not taken into account.

So I got back to the traditional galaxy webserver and everything is working fine.

I found uwsgi more reactive and would like to change in the future, so if anybody has an idea how to solve this problem I'm all ears!

 

Cheers!

Cristian

ADD COMMENTlink written 3.3 years ago by tuto34520
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 102 users visited in the last hour