Hi, I seem to be running into a problem on the galaxy test server (due to availability of a new genome annotation - PapAnu2). I am receiving the following errors from the test server: "Failed to find or download one or more job outputs from remote server" and "Failed to communicate with remote job server" when trying to run TopHat jobs. Am I doing something wrong here or is this related to the cluster deployment issues of the past few days? Thanks! Brian
Hello Brian,
I believe that this was addressed in a bug report, but rerunning the job is the solution for this type of cluster error.
The problems are related to the upgrade changes at http://usegalaxy.org. But can occur at other times. We process a truly huge number of jobs, every day, 24/7.
For anyone else reading this post - a rerun is the only solution for these specific error types.
Exception - if this occurs multiple times, posting here to let us know about the problem is still OK so we can be aware of how this is impacting users. We track failures in other ways, but this helps to communicate both the problem and when a fix has been implemented to the community.
The best way to post is to
1) First check other recent posts to see if a thread about the problem has already been reported in last 24 hrs or so
2) If your issue is the same, add a comment to the existing thread instead of starting a new thread
3) OR, if a completely different type of error (same or different tool - if error is different/specific to that tool - the issue is not the same), start a new thread.
4) Go ahead and rerun without waiting for a reply - this is the quickest path to a successful result in most cases, especially if after US business hours or over a weekend/holiday.
5) If the rerun is a success, posting that back here in the comments is a way to join in on the discussion to let other users know what happened for your use-case.
6) If a failure, know that we are following posts and will reply when a fix is ticketed, pending, and/or confirmed as implemented. For cluster and other issues.
The cluster is monitored and problems fixed without sending notice for every short-lived hiccup during this upgrade time. But please know we do appreciate the community feedback and having place to share cluster status, even if problems are brief, here in Galaxy Biostars. A post/comment here is more effective than sending in a bug report for these similar types of cluster problems since all posts are public (bugs is private and 1-1).
I am sharing more details here as a sort of reference for users to know what to do when cluster problems are encountered. Might generalize it and make a quick tutorial - let me think about that. The wiki has these details, but it is not always obvious to go read there for odd errors.
Our apologies for the confusion this caused you and others who ran into this issue, mid-last week and over the weekend! All is getting progressively better with the upgraded code base. The changes will be worth it, very soon! Jen, Galaxy team
Thanks Jen, this is very helpful. Thanks for all you and your team do to provide outstanding informatics infrastructure. Best, Brian
From: Jennifer Hillman Jackson on Galaxy Biostar [mailto:notifications@biostars.org] Sent: Monday, May 02, 2016 7:43 PM To: Brian Hermann Brian.Hermann@utsa.edu Subject: [galaxy-biostar] TopHat jobs failing - Failed to find or download one or more job outputs from remote server.
Activity on a post you are following on Galaxy Biostarhttp://biostar.usegalaxy.org
User Jennifer Hillman JacksonJennifer Hillman Jackson wrote Answer: TopHat jobs failing - Failed to find or download one or more job outputs from remote server.A: TopHat jobs failing - Failed to find or download one or more job outputs from re:
Hello Brian,
I believe that this was addressed in a bug report, but rerunning the job is the solution for this type of cluster error.
The problems are related to the upgrade changes at http://usegalaxy.org. But can occur at other times. We process a truly huge number of jobs, every day, 24/7.
For anyone else reading this post - a rerun is the only solution for these specific error types.
Exception - if this occurs multiple times, posting here to let us know about the problem is still OK so we can be aware of how this is impacting users. We track failures in other ways, but this helps to communicate both the problem and when a fix has been implemented to the community.
The best way to post is to
1) First check other recent posts to see if a thread about the problem has already been reported in last 24 hrs or so
2) If your issue is the same, add a comment to the existing thread instead of starting a new thread
3) OR, if a completely different type of error (same or different tool - if error is different/specific to that tool - the issue is not the same), start a new thread.
4) Go ahead and rerun without waiting for a reply - this is the quickest path to a successful result in most cases, especially if after US business hours or over a weekend/holiday.
5) If the rerun is a success, posting that back here in the comments is a way to join in on the discussion to let other users know what happened for your use-case.
6) If a failure, know that we are following posts and will reply when a fix is ticketed, pending, and/or confirmed as implemented. For cluster and other issues.
The cluster is monitored and problems fixed without sending notice for every short-lived hiccup during this upgrade time. But please know we do appreciate the community feedback and having place to share cluster status, even if problems are brief, here in Galaxy Biostars. It is more effective in most cases than sending in a bug report. It is definitely a quicker to rerun rather than sending a bug report for clear cluster issues (especially known issues with a post here).
I am sharing more details here as a sort of reference for users to know what to do when cluster problems are encountered. Might generalize it and make a quick tutorial - let me think about that. The wiki has these details, but it is not always obvious to go read there for odd errors.
Our apologies for the confusion this caused you and others who ran into this issue, mid-last week and over the weekend! All is getting progresivley better with the upgraded code base. The changes will be worth it, very soon! Jen, Galaxy team
Hi Jen, I wanted to touch base with you again about my troubles running these kinds of jobs on usealaxy.org. I am now routinely (100%) obtaining the following error "Remote job server indicated a problem running or monitoring this job." I have tried rerunning the jobs, but it seems to be persistent. Is this related to the previous errors or somehow different? Thank you VERY much for your time. Best, Brian