Question: Error with Torque/drmaa
gravatar for tuto345
3.9 years ago by
tuto34520 wrote:

Dear all,

I followed the instructions at on a single machine with 40 cores and centos 7. I couldn't scramble the pbs_python egg (tried 4.3.5 and 4.4.0) so I installed the recomended pbs_drmaa (tried with 1.0.17 and 1.0.18) and indicate the path in the job_conf.xml file. While torque (4.2.9) is correctly installed and the galaxy user can submit jobs, when submitting through the galaxy interface the jobs end immediately with a "state change: job finished, but failed" message in galaxy logs and Exit_status=2  in the pbs_server log.

Any idea what might be wrong?


Logs for galaxy handler and pbs_server (changed the server name!)

Galaxy log: DEBUG 2015-01-08 14:36:08,489 (21050) Working directory for job is: /home/galaxy/galaxy-dist/database/job_working_directory/021/21050 DEBUG 2015-01-08 14:36:08,516 (21050) Dispatching to torque runner DEBUG 2015-01-08 14:36:08,651 (21050) Persisting job destination (destination id: torque) INFO 2015-01-08 14:36:08,671 (21050) Job dispatched DEBUG 2015-01-08 14:36:09,013 Building dependency shell command for dependency 'cufflinks' DEBUG 2015-01-08 14:36:09,239 (21050) command is: PACKAGE_BASE=/home/galaxy/galaxy-dist/dependency_dir/cufflinks/2.1.1/devteam/cufflinks/9aab29e159a7; export PACKAGE_BASE; . /home/galaxy/galaxy-dist/dependency_dir/cufflinks/2.1.1/devteam/cufflinks/9aab29e159a7/; cufflinks 2>&1 | head -n 1 > /home/galaxy/galaxy-dist/database/tmp/GALAXY_VERSION_STRING_21050 2>&1; python /home/galaxy/shed_tools/              --input=/home/galaxy/galaxy-dist/database/files/031/dataset_31505.dat             --assembled-isoforms-output=/home/galaxy/galaxy-dist/database/files/032/dataset_32805.dat             --num-threads="${GALAXY_SLOTS:-4}"             -I 300000             -F 0.1             -j 0.15; return_code=$?; if [ -f /home/galaxy/galaxy-dist/database/job_working_directory/021/21050/global_model.txt ] ; then cp /home/galaxy/galaxy-dist/database/job_working_directory/021/21050/global_model.txt /home/galaxy/galaxy-dist/database/files/032/dataset_32806.dat ; fi..... <snipped a lot of text/> DEBUG 2015-01-08 14:36:09,270 (21050) submitting file /home/galaxy/galaxy-dist/database/job_working_directory/021/21050/ INFO 2015-01-08 14:36:09,276 (21050) queued as DEBUG 2015-01-08 14:36:09,307 (21050) Persisting job destination (destination id: torque) DEBUG 2015-01-08 14:36:10,285 (21050/ state change: job finished, but failed
galaxy.datatypes.metadata DEBUG 2015-01-08 14:36:10,633 Cleaning up external metadata files
galaxy.datatypes.metadata DEBUG 2015-01-08 14:36:10,650 Failed to cleanup MetadataTempFile temp files from /home/galaxy/galaxy-dist/database/job_working_directory/021/21050/metadata_out_HistoryDatasetAssociation_39981_LGNwud: No JSON object could be decoded

pbs_server log:

01/08/2015 14:36:09;0100;PBS_Server.27598;Job;;enqueuing into batch, state 1 hop 1
01/08/2015 14:36:09;0008;PBS_Server.27598;Job;req_commit;job_id:
01/08/2015 14:36:09;0008;PBS_Server.29816;Job;;Job Modified at request of
01/08/2015 14:36:09;0008;PBS_Server.29816;Job;;Job Run at request of
01/08/2015 14:36:09;000d;PBS_Server.29816;Job;;Not sending email: User does not want mail of this type.
01/08/2015 14:36:09;0010;PBS_Server.27662;Job;;Exit_status=2 resources_used.cput=00:00:00 resources_used.mem=0kb resources_used.vmem=0kb resou
01/08/2015 14:36:09;000d;PBS_Server.27662;Job;;Not sending email: User does not want mail of this type.
01/08/2015 14:36:09;0008;PBS_Server.27662;Job;;on_job_exit valid pjob: (substate=50)
01/08/2015 14:36:44;0002;PBS_Server.28396;Svr;PBS_Server;Torque Server Version = 4.2.9, loglevel = 0
01/08/2015 14:37:17;0100;PBS_Server.28396;Job;;dequeuing from batch, state COMPLETE
software error • 1.4k views
ADD COMMENTlink modified 3.9 years ago • written 3.9 years ago by tuto34520
gravatar for tuto345
3.9 years ago by
tuto34520 wrote:

Hi all,

Just for completion, The problem comes from uwsgi and/or most probably supervisord. When I start the uwsgi server through supervisord it does not launch the virtualenv, even when I explicitly set the appropriate command line in the supervisord configuration file. And of course the .bashrc contains the virtualenv activation command, but somehow it is not taken into account.

So I got back to the traditional galaxy webserver and everything is working fine.

I found uwsgi more reactive and would like to change in the future, so if anybody has an idea how to solve this problem I'm all ears!




ADD COMMENTlink written 3.9 years ago by tuto34520
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 169 users visited in the last hour