Question: Problems with TopHat
0
gravatar for joe.bedont
2.8 years ago by
joe.bedont0
United States
joe.bedont0 wrote:

Hello there, any advice on how to get TopHat running?  I can't even get the job to start half the time (waiting a couple days for it to do so).  Or if it does, it runs for days and never finishes.  Been trying to use TACC beta to run the job--is that still the best thing to do?  Or more generally, any advice on tweaks I could try to get this working?  Thanks in advance.

Cheers,

 

Joe

tophat alignment • 1.0k views
ADD COMMENTlink modified 2.8 years ago • written 2.8 years ago by joe.bedont0
0
gravatar for Jennifer Hillman Jackson
2.8 years ago by
United States
Jennifer Hillman Jackson24k wrote:

Hello,

We are looking into this to determine current expected wait time. This is variable on Stampede, but I can see that these jobs were launched on Monday. Here are the specifications for that cluster as reference: Direct_job_execution_on_Stampede

Jobs never finishing is odd. They should complete (success or fail) once execution begins (the dataset turns yellow) within the wall-time limit. Is this not what you experienced? I can take a look at one of these if you want to just share the history name and the dataset number for one that ran this way (this info will not compromise your account privacy - so please do not post anything else back unless you wish to share it publically).

Thanks and we will get back to you with expected wait time today and/or if there is something unexpected going on with the cluster. Jen, Galaxy team

ADD COMMENTlink modified 2.8 years ago • written 2.8 years ago by Jennifer Hillman Jackson24k

OK, quick update thanks to our speedy admin Nate! There was an issue that is now corrected. If you leave the queued jobs alone, they will execute as expected. The wait time for this could be a day. Our apologies for the trouble .. this is still in beta mode. Stampede is our attempt to provide more compute to users that have larger jobs while still maintaining access to compute for users with smaller jobs. 

If you still want to share the "never finished job", please do, as that is a distinct issue.

Jen

ADD REPLYlink written 2.8 years ago by Jennifer Hillman Jackson24k

Hi Jennifer, I unfortunately already deleted the one that never quit running, after waiting on it for a few days to finish. The queue times I've been experiencing have been significantly longer than a day, but I'll sit tight.

ADD REPLYlink written 2.8 years ago by joe.bedont0

Yes, the queue times ("grey" status) has been longer than usual for Stampede this week. We apologise for that.

For the other jobs you quit out of, note the "wall-time" of the target cluster. Jobs in progress ("yellow" status) that have not passed that time threshold should be allowed to process, otherwise deleting and restarting them just loses what processing was done and then you are back at the end of the queue again. 

More about interpreting dataset status from UI cues is here: Dataset_status_and_how_jobs_execute

Best, Jen

ADD REPLYlink written 2.8 years ago by Jennifer Hillman Jackson24k

Hi again, Jennifer. How bad are the wait times on TACC right now, exactly? I've been waiting 4 days now for my current TopHat to start running (still gray).

Cheers,

Joe

ADD REPLYlink written 2.8 years ago by joe.bedont0

Does the Galaxy Default still just re-direct me to TACC? If not, I'm inclined to try it instead.

Cheers,

Joe

ADD REPLYlink written 2.8 years ago by joe.bedont0

Hi Joe,

Both queues are full and jobs are processing. There was a backlog as i explained, and one cluster being slow likely means that many users are probably deleting and restarting jobs, which doesn't help. Deleting and restarting causes some delays as well for individual accounts because the database needs to sort through all of the changing transactions (removing canceled jobs, adding in new ones). Starting jobs and letting them run is almost always the best choice.

That said, there are no known issues at this time with respect to the clusters themselves. You could submit a job to the other cluster to test how quick one is versus the other, but note the wall-time differences - if the jobs are very large (extended run-time), Stampede is still best. 

Just be aware that while you can launch as many jobs as you want in your history, there is a quota limit on how many of these will actually enter an active cluster queue. And if you run out of disk space, all jobs downstream will be "paused" until data is permanently deleted.

If your work is urgent, a cloud Galaxy could be another option. AWS offers grants to researchers. Links about Cloudman and grants are in this wiki section: https://wiki.galaxyproject.org/Support#About_Galaxy

I have this bookmarked for updates, but also feel free to ask for one on Monday if your jobs are still not executing by then.

ADD REPLYlink modified 2.8 years ago • written 2.8 years ago by Jennifer Hillman Jackson24k

Okay, thanks Jennifer.  I'm only trying to run one TopHat currently on TACC; I'll start up the run I'll need to run on the other sample in this cohort in Galaxy Default, and see if it finishes any faster.

I'll drop you a line Monday, if I haven't heard from you and am still experiencing issues.

Cheers,

Joe

ADD REPLYlink written 2.8 years ago by joe.bedont0
0
gravatar for joe.bedont
2.8 years ago by
joe.bedont0
United States
joe.bedont0 wrote:

Update: the TopHat that was trying to run on TACC finally failed.  The error read "Remote job server could not determine this job's state." I restarted it, but I'm not optimistic about it working.

The other TopHat that I tried to run on Galaxy Default is still in queue.

Have you been hearing from anyone else with similar problems?  

Cheers,

Joe

ADD COMMENTlink written 2.8 years ago by joe.bedont0
1

Hi Joe,

Sorry for the trouble - I tried to recover a number of jobs that had been submitted but never ran, and this process failed. Your job appears to have been one of these. You can resubmit it and it should complete successfully.

If your job is expected to be relatively short (<10 hours in the case of Tophat), you don't need to use Stampede - the regular Galaxy cluster would be quicker.

--nate

ADD REPLYlink written 2.8 years ago by Nate Coraor3.1k

We are investigating and will reply shortly. Thank you! Jen, Galaxy team

ADD REPLYlink written 2.8 years ago by Jennifer Hillman Jackson24k
0
gravatar for joe.bedont
2.8 years ago by
joe.bedont0
United States
joe.bedont0 wrote:

I'd resubmitted it yesterday.  Okay, I'll sit tight and see if these finish.  Thanks Nate.

ADD COMMENTlink written 2.8 years ago by joe.bedont0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 159 users visited in the last hour