Dear all,
I followed the instructions at https://wiki.galaxyproject.org/Events/GCC2014/TrainingDay/AdminWalkthrough#Configure_Galaxy on a single machine with 40 cores and centos 7. I couldn't scramble the pbs_python egg (tried 4.3.5 and 4.4.0) so I installed the recomended pbs_drmaa (tried with 1.0.17 and 1.0.18) and indicate the path in the job_conf.xml file. While torque (4.2.9) is correctly installed and the galaxy user can submit jobs, when submitting through the galaxy interface the jobs end immediately with a "state change: job finished, but failed" message in galaxy logs and Exit_status=2 in the pbs_server log.
Any idea what might be wrong?
Cristian
Logs for galaxy handler and pbs_server (changed the server name!)
Galaxy log:
galaxy.jobs DEBUG 2015-01-08 14:36:08,489 (21050) Working directory for job is: /home/galaxy/galaxy-dist/database/job_working_directory/021/21050 galaxy.jobs.handler DEBUG 2015-01-08 14:36:08,516 (21050) Dispatching to torque runner galaxy.jobs DEBUG 2015-01-08 14:36:08,651 (21050) Persisting job destination (destination id: torque) galaxy.jobs.handler INFO 2015-01-08 14:36:08,671 (21050) Job dispatched galaxy.tools.deps DEBUG 2015-01-08 14:36:09,013 Building dependency shell command for dependency 'cufflinks' galaxy.jobs.runners DEBUG 2015-01-08 14:36:09,239 (21050) command is: PACKAGE_BASE=/home/galaxy/galaxy-dist/dependency_dir/cufflinks/2.1.1/devteam/cufflinks/9aab29e159a7; export PACKAGE_BASE; . /home/galaxy/galaxy-dist/dependency_dir/cufflinks/2.1.1/devteam/cufflinks/9aab29e159a7/env.sh; cufflinks 2>&1 | head -n 1 > /home/galaxy/galaxy-dist/database/tmp/GALAXY_VERSION_STRING_21050 2>&1; python /home/galaxy/shed_tools/toolshed.g2.bx.psu.edu/repos/devteam/cufflinks/9aab29e159a7/cufflinks/cufflinks_wrapper.py --input=/home/galaxy/galaxy-dist/database/files/031/dataset_31505.dat --assembled-isoforms-output=/home/galaxy/galaxy-dist/database/files/032/dataset_32805.dat --num-threads="${GALAXY_SLOTS:-4}" -I 300000 -F 0.1 -j 0.15; return_code=$?; if [ -f /home/galaxy/galaxy-dist/database/job_working_directory/021/21050/global_model.txt ] ; then cp /home/galaxy/galaxy-dist/database/job_working_directory/021/21050/global_model.txt /home/galaxy/galaxy-dist/database/files/032/dataset_32806.dat ; fi..... <snipped a lot of text/> galaxy.jobs.runners.drmaa DEBUG 2015-01-08 14:36:09,270 (21050) submitting file /home/galaxy/galaxy-dist/database/job_working_directory/021/21050/galaxy_21050.sh galaxy.jobs.runners.drmaa INFO 2015-01-08 14:36:09,276 (21050) queued as 51.myserver.local.net galaxy.jobs DEBUG 2015-01-08 14:36:09,307 (21050) Persisting job destination (destination id: torque) galaxy.jobs.runners.drmaa DEBUG 2015-01-08 14:36:10,285 (21050/51.myserver.local.net) state change: job finished, but failed galaxy.datatypes.metadata DEBUG 2015-01-08 14:36:10,633 Cleaning up external metadata files galaxy.datatypes.metadata DEBUG 2015-01-08 14:36:10,650 Failed to cleanup MetadataTempFile temp files from /home/galaxy/galaxy-dist/database/job_working_directory/021/21050/metadata_out_HistoryDatasetAssociation_39981_LGNwud: No JSON object could be decoded
pbs_server log:
01/08/2015 14:36:09;0100;PBS_Server.27598;Job;51.myserver.local.net;enqueuing into batch, state 1 hop 1 01/08/2015 14:36:09;0008;PBS_Server.27598;Job;req_commit;job_id: 51.myserver.local.net 01/08/2015 14:36:09;0008;PBS_Server.29816;Job;51.myserver.local.net;Job Modified at request of root@myserver.local.net 01/08/2015 14:36:09;0008;PBS_Server.29816;Job;51.myserver.local.net;Job Run at request of root@myserver.local.net 01/08/2015 14:36:09;000d;PBS_Server.29816;Job;51.myserver.local.net;Not sending email: User does not want mail of this type. 01/08/2015 14:36:09;0010;PBS_Server.27662;Job;51.myserver.local.net;Exit_status=2 resources_used.cput=00:00:00 resources_used.mem=0kb resources_used.vmem=0kb resou rces_used.walltime=00:00:00 01/08/2015 14:36:09;000d;PBS_Server.27662;Job;51.myserver.local.net;Not sending email: User does not want mail of this type. 01/08/2015 14:36:09;0008;PBS_Server.27662;Job;51.myserver.local.net;on_job_exit valid pjob: 51.myserver.local.net (substate=50) 01/08/2015 14:36:44;0002;PBS_Server.28396;Svr;PBS_Server;Torque Server Version = 4.2.9, loglevel = 0 01/08/2015 14:37:17;0100;PBS_Server.28396;Job;51.myserver.local.net;dequeuing from batch, state COMPLETE