Hello everyone,
I am meeting a problem with the number of concurrent job running on my local instance of galaxy. Currently, I can run simultaneously at most 10 jobs. But it is clearly underusing the capability of my server. I am using the paste method with the default configuration (1 server 1 handler) and my runner is HTCondor.
I looked into simple solutions thanks to a conversation pvh_sa on the irc:
- set "concurrent_jobs" and "registered_user_concurrent_jobs" to 30 in the <limits> block of the job_conf.xml
- increasing the number of threadpool_workers to 20 in the galaxy.ini
- increasing the number of htcondor worker from 4 to 10.
But none of them worked. So I wonder what is limiting the number of concurrent jobs to 10 in my instance ? I was thinking about switching from paste to uwsgi to add servers and handlers, but since I don't know where is the limit of concurrent jobs I am not sure that would solve my problem. I would like to understand this before adding complexity.
Your help regarding the understanding of this problem would be very much appreciated.
Thank you !
Christ
Hi Christophe
Just double checking: is the "job_config_file" setting in the 'galaxy.ini' file pointing to the right 'job_conf.xml' file?
Regards, Hans-Rudolf
This might be helpful, if the limit is within htcondor itself: http://georgi.hristozov.net/2015/08/28/increasing-the-number-of-shared-port-workers-in-htcondor.html
Hi, I double checked the galaxy.ini and the job_config_file is pointing to the right job_conf.xml where I set the user concurrent job. So the limit isn't here :(
Concerning the limit within htcondor itself I do not reach the limit of 50 workers. I tried from 4 to 10 workers with no change on the limit on the number of concurrent job observed.
So I wondered if this limit was set in the Debian system itself, but it makes no sens to have a limit to 10 jobs when u have 32 slots available ...
None of you observed this limit when building an instance from scratch and trying to launch a lot of analysis ?
"So I wondered if this limit was set in the Debian system itself, but it makes no sens to have a limit to 10 jobs when u have 32 slots available ..."
...well, this rings a (non-galaxy-related) bell: We recently bought a box with 48 cpus. When we tried running STAR using more than 16 cpus it broke. As it turns out: "nofile" (ie the limit on the number of files that a single process can have open at a time ) was set to 16. I realize, your problem is the number of jobs and not the number of open files, but maybe....
And just to double check: what is the output of "ulimit -u"
How do you get the nofile information ? The ulimit -u gives : 1033427 . But I don't understand what information this command provides..
run "ulimit -a" to get mor explanations and check the file: "/etc/security/limits.conf" - though I am not sure this is all the same for Debian
Here is what I obtain from the command ulimit -a
The max user processes is huge, and I see nothing here that could be a limit for galaxy or any unix user..
The limits.conf file is entirely commented, and it contains no information regarding any default value for users.
Do you know if there is a limit of jobs per handler in an instance ?
I am sorry, but I have reached my 'sys-admin knowledge' :(
I wonder about your experience with galaxy.
Do you directly set your instance with several servers and handlers ? That would explain why you never observed this limit before if this is the cause of my problem.
No, it is all on one box, with several "LocalJobRunner"