Question: Flagstat Crashes In Non-Linear Workflows With Torque
0
gravatar for Andrew Warren
7.3 years ago by
Andrew Warren10 wrote:
I was getting the same behavior as you on asynchronous workflows on a multicore computer that is acting as both head and compute node for the torque system. Even after recompiling with a higher NCONNECTS I was getting the same error. I suspect that this is due to galaxy opening up multiple connections to check the status of currently running jobs. Because there can be many status checks in an asynchronous workflow the pbs system is randomly busy depending on when the job submission comes in. To deal with this I modified the lib/galaxy/jobs/runners/pbs.py script to make multiple attempts at submitting in the following way: @@ -286,6 +286,12 @@ class PBSJobRunner( BaseJobRunner ): log.debug("(%s) submitting file %s" % ( galaxy_job_id, job_file ) ) log.debug("(%s) command is: %s" % ( galaxy_job_id, command_line ) ) job_id = pbs.pbs_submit(c, job_attrs, job_file, pbs_queue_name, None) + ##Modified to give ten tries for qsubbing a job + num_try=0 + while(not job_id and num_try<10): + job_id = pbs.pbs_submit(c, job_attrs, job_file, pbs_queue_name, None) + num_try+=1 + pbs.pbs_disconnect(c) # check to see if it submitted I haven't had any problems since. Cheers, Andrew
galaxy • 649 views
ADD COMMENTlink written 7.3 years ago by Andrew Warren10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 179 users visited in the last hour