Question: Error with Torque/drmaa
Dear all,

I followed the instructions at on a single machine with 40 cores and centos 7. I couldn't scramble the pbs_python egg (tried 4.3.5 and 4.4.0) so I installed the recomended pbs_drmaa (tried with 1.0.17 and 1.0.18) and indicate the path in the job_conf.xml file. While torque (4.2.9) is correctly installed and the galaxy user can submit jobs, when submitting through the galaxy interface the jobs end immediately with a "state change: job finished, but failed" message in galaxy logs and Exit_status=2  in the pbs_server log.

Any idea what might be wrong?


Logs for galaxy handler and pbs_server (changed the server name!)

Galaxy log: DEBUG 2015-01-08 14:36:08,489 (21050) Working directory for job is: /home/galaxy/galaxy-dist/database/job_working_directory/021/21050 DEBUG 2015-01-08 14:36:08,516 (21050) Dispatching to torque runner DEBUG 2015-01-08 14:36:08,651 (21050) Persisting job destination (destination id: torque) INFO 2015-01-08 14:36:08,671 (21050) Job dispatched DEBUG 2015-01-08 14:36:09,013 Building dependency shell command for dependency 'cufflinks' DEBUG 2015-01-08 14:36:09,239 (21050) command is: PACKAGE_BASE=/home/galaxy/galaxy-dist/dependency_dir/cufflinks/2.1.1/devteam/cufflinks/9aab29e159a7; export PACKAGE_BASE; . /home/galaxy/galaxy-dist/dependency_dir/cufflinks/2.1.1/devteam/cufflinks/9aab29e159a7/; cufflinks 2>&1 | head -n 1 > /home/galaxy/galaxy-dist/database/tmp/GALAXY_VERSION_STRING_21050 2>&1; python /home/galaxy/shed_tools/              --input=/home/galaxy/galaxy-dist/database/files/031/dataset_31505.dat             --assembled-isoforms-output=/home/galaxy/galaxy-dist/database/files/032/dataset_32805.dat             --num-threads="${GALAXY_SLOTS:-4}"             -I 300000             -F 0.1             -j 0.15; return_code=$?; if [ -f /home/galaxy/galaxy-dist/database/job_working_directory/021/21050/global_model.txt ] ; then cp /home/galaxy/galaxy-dist/database/job_working_directory/021/21050/global_model.txt /home/galaxy/galaxy-dist/database/files/032/dataset_32806.dat ; fi..... <snipped a lot of text/> DEBUG 2015-01-08 14:36:09,270 (21050) submitting file /home/galaxy/galaxy-dist/database/job_working_directory/021/21050/ INFO 2015-01-08 14:36:09,276 (21050) queued as DEBUG 2015-01-08 14:36:09,307 (21050) Persisting job destination (destination id: torque) DEBUG 2015-01-08 14:36:10,285 (21050/ state change: job finished, but failed
galaxy.datatypes.metadata DEBUG 2015-01-08 14:36:10,633 Cleaning up external metadata files
galaxy.datatypes.metadata DEBUG 2015-01-08 14:36:10,650 Failed to cleanup MetadataTempFile temp files from /home/galaxy/galaxy-dist/database/job_working_directory/021/21050/metadata_out_HistoryDatasetAssociation_39981_LGNwud: No JSON object could be decoded

pbs_server log:

01/08/2015 14:36:09;0100;PBS_Server.27598;Job;;enqueuing into batch, state 1 hop 1
01/08/2015 14:36:09;0008;PBS_Server.27598;Job;req_commit;job_id:
01/08/2015 14:36:09;0008;PBS_Server.29816;Job;;Job Modified at request of
01/08/2015 14:36:09;0008;PBS_Server.29816;Job;;Job Run at request of
01/08/2015 14:36:09;000d;PBS_Server.29816;Job;;Not sending email: User does not want mail of this type.
01/08/2015 14:36:09;0010;PBS_Server.27662;Job;;Exit_status=2 resources_used.cput=00:00:00 resources_used.mem=0kb resources_used.vmem=0kb resou
01/08/2015 14:36:09;000d;PBS_Server.27662;Job;;Not sending email: User does not want mail of this type.
01/08/2015 14:36:09;0008;PBS_Server.27662;Job;;on_job_exit valid pjob: (substate=50)
01/08/2015 14:36:44;0002;PBS_Server.28396;Svr;PBS_Server;Torque Server Version = 4.2.9, loglevel = 0
01/08/2015 14:37:17;0100;PBS_Server.28396;Job;;dequeuing from batch, state COMPLETE
Hi all,

Just for completion, The problem comes from uwsgi and/or most probably supervisord. When I start the uwsgi server through supervisord it does not launch the virtualenv, even when I explicitly set the appropriate command line in the supervisord configuration file. And of course the .bashrc contains the virtualenv activation command, but somehow it is not taken into account.

So I got back to the traditional galaxy webserver and everything is working fine.

I found uwsgi more reactive and would like to change in the future, so if anybody has an idea how to solve this problem I'm all ears!




