Question: Delay in job processing for EC2 instance
1
gravatar for mhayes20
3.7 years ago by
mhayes2030
United States
mhayes2030 wrote:

Hello,

I submitted a few jobs using an EC2 instance of Galaxy. My jobs have been waiting in the queue indefinitely; one of my jobs had been waiting for 4 days before I deleted it.

I am the only user of this instance. Am I correct in assuming that the jobs should execute immediately?

Thank you.

jobs ec2 • 1.3k views
ADD COMMENTlink modified 3.7 years ago • written 3.7 years ago by mhayes2030
0
gravatar for Jennifer Hillman Jackson
3.7 years ago by
United States
Jennifer Hillman Jackson25k wrote:

Hello,

Are you the administrator? Is it by chance a spot instance? If not, please provide more details should the issue still be present.

Thanks, Jen, Galaxy team

ADD COMMENTlink written 3.7 years ago by Jennifer Hillman Jackson25k

Thank you Jen. No, it is not a spot instance, but I am indeed the administrator. I logged in as a super user and submitted a job on 3/15/15, but it is still waiting in the queue. 

ADD REPLYlink written 3.7 years ago by mhayes2030

Also, I have submitted two jobs that are very small; they just echo a message to the screen. I submitted these jobs an hour ago and I am still waiting for them to run.

This is a Cloudman instance, so I am very perplexed by this, especially since no other jobs are running and no other users are logged in.

ADD REPLYlink written 3.7 years ago by mhayes2030

If you look at the cloudman admin console (<instance ip>/cloud/admin) - what is the status of SGE service? What about qstat output? If things aren't looking ok, you can try restarting SGE service to see if that brings it back to life.

ADD REPLYlink modified 3.7 years ago • written 3.7 years ago by Enis Afgan690

The SGE service status is "running". When I checked the log for SGE, I had the following messages:

 

03/20/2015 20:47:16|  main|ip-172-31-30-120|I|read job database with 0 entries in 0 seconds
03/20/2015 20:47:16|  main|ip-172-31-30-120|E|error opening file "/opt/sge/default/common/./sched_configuration" for reading: No such file or directory
03/20/2015 20:47:16|  main|ip-172-31-30-120|E|error opening file "/opt/sge/default/spool/qmaster/./sharetree" for reading: No such file or directory
03/20/2015 20:47:16|  main|ip-172-31-30-120|I|qmaster hard descriptor limit is set to 8192
03/20/2015 20:47:16|  main|ip-172-31-30-120|I|qmaster soft descriptor limit is set to 8192
03/20/2015 20:47:16|  main|ip-172-31-30-120|I|qmaster will use max. 8172 file descriptors for communication
03/20/2015 20:47:16|  main|ip-172-31-30-120|I|qmaster will accept max. 99 dynamic event clients
03/20/2015 20:47:16|  main|ip-172-31-30-120|I|starting up GE 6.2u5 (lx24-amd64)
03/20/2015 20:47:16|  main|ip-172-31-30-120|W|can't open job sequence number file "jobseqnum": for reading: No such file or directory -- guessing next number
03/20/2015 20:47:16|  main|ip-172-31-30-120|W|can't open ar sequence number file "arseqnum": for reading: No such file or directory -- guessing next number
03/20/2015 20:47:19|worker|ip-172-31-30-120|E|adminhost "ip-172-31-30-120.ec2.internal" already exists

 

Edit: Now that I check again, I did receive the following error when I submitted the job:

"Unable to run job: warning: ubuntu your job is not allowed to run in any queue"

Though I'm still unsure why this is happening. Could it be a disk capacity issue?

 

ADD REPLYlink modified 3.7 years ago • written 3.7 years ago by mhayes2030

You see that message when trying to run a job via Galaxy or by hand from the command line? If on the command line, you should change into 'galaxy' user with "sudo su galaxy" and submitting the job again.

However, I'd say that's probably an issue with SGE and that something went wonky in the configuration. Did you try restarting SGE service from CloudMan?

ADD REPLYlink written 3.7 years ago by Enis Afgan690

I did. I think I will just create another instance and I will see if I still have problems.

ADD REPLYlink written 3.7 years ago by mhayes2030
0
gravatar for mhayes20
3.7 years ago by
mhayes2030
United States
mhayes2030 wrote:

edit for deletion

ADD COMMENTlink modified 3.7 years ago • written 3.7 years ago by mhayes2030
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 181 users visited in the last hour