Hi Everyone, as i made a few 16S workflow i was asked to test Galaxy and see its strengths and weaknesses. I run my pipeline on my 36 samples and it failed due to to much information, which I can understand I have more than 12 millions sequences on my samples. So I reduced my number of samples to 13 which make 2 millions sequences, but at a specific step ( pre.cluster ) , it doesn't tell me an error, but keep going and for now more than 15h !! I'm wondering is it still to much ? How can i Know that something is going on and that i m not wasting my time here ! Thank you very much for your answer !
Heads up! This is a static archive of our support site. Please go to help.galaxyproject.org if you want to reach the Galaxy community. If you want to search this archive visit the Galaxy Hub search
Question: Galaxy running for too long
0
alpac • 0 wrote:
1
Martin Čech ♦♦ 4.9k wrote:
I assume you are working on usegalaxy.org.
I do not know what tool you are executing that takes this much time, but it is possible for some tools to take this long with large inputs. Generally if the tool runs into trouble it will stop and show an error that you would see in Galaxy interface.
Please log in to add an answer.
Use of this site constitutes acceptance of our User
Agreement
and Privacy
Policy.
Powered by Biostar
version 16.09
Traffic: 176 users visited in the last hour
Hi thank you for your answer ! Yes indeed i'm working on usegalaxy.org. It finished an hour ago ! But i decided to reduce again my samples, But i want to know is how much data is "tolerate" by galaxy ? Thank you very much !
Your limit on usegalaxy.org will most probably be your space quota (250GB by default). Galaxy can work with much bigger files but you have to provide the infrastructure. see https://galaxyproject.org/choices/
Jobs will use resources in Galaxy similar to what they would use when running the same exact job line-command. The memory used and how long the job runs are dependent on the factors such as the tool itself (some use more resources than others, wherever executed), parameters selected, and data content (query input and targets, if included).
When using a public Galaxy server, the resources allocated for tools can vary by server. To find out if your data will run successfully at any, you'll need to do execute test jobs/workflows and review the outcome: success; failed for server-resource reasons; or failed for reasons that would also trigger the job to fail line-command (under the same conditions). It is possible to construct jobs that will run out of resources no matter where it is run, Galaxy or line-command, even if given unlimited resources. Review the underlying 3rd party wrapped tool to better understand how it works, limitations, and best practice usage.
If jobs are too large to run at any particular public Galaxy server, you can try others. And if you cannot find a public server that has sufficient resources to process your data, you can set up your own Galaxy and allocate the resources needed by your chosen data/tools.
FAQs:
Thanks! Jen, Galaxy team