Question: Flagstat Crashes In Non-Linear Workflows With Torque
7.3 years ago
Andrew Warren wrote:
I was getting the same behavior as you on asynchronous workflows on a multicore computer that is acting as both head and compute node for the torque system. Even after recompiling with a higher NCONNECTS I was getting the same error. I suspect that this is due to galaxy opening up multiple connections to check the status of currently running jobs. Because there can be many status checks in an asynchronous workflow the pbs system is randomly busy depending on when the job submission comes in. To deal with this I modified the lib/galaxy/jobs/runners/ script to make multiple attempts at submitting in the following way: @@ -286,6 +286,12 @@ class PBSJobRunner( BaseJobRunner ): log.debug("(%s) submitting file %s" % ( galaxy_job_id, job_file ) ) log.debug("(%s) command is: %s" % ( galaxy_job_id, command_line ) ) job_id = pbs.pbs_submit(c, job_attrs, job_file, pbs_queue_name, None) + ##Modified to give ten tries for qsubbing a job + num_try=0 + while(not job_id and num_try<10): + job_id = pbs.pbs_submit(c, job_attrs, job_file, pbs_queue_name, None) + num_try+=1 + pbs.pbs_disconnect(c) # check to see if it submitted I haven't had any problems since. Cheers, Andrew
