Question: Newbie Questions
gravatar for Michael Rusch
10.3 years ago by
Michael Rusch80 wrote:
We're strongly considering switching to Galaxy from a piece of home- built software that we're in the process of developing. So, I have a couple of newbie questions to see what people's experience is. How does Galaxy scale? Does anybody have experience with scaling to thousands of datasets, or working with datasets in the hundreds of megabytes? We have traditionally done most of our work using a MySQL backend. I haven't (yet) received the green light from our sysadmin to install Postgres, and I'm wondering if anybody has any experience running on MySQL. Is it possible? Are there pitfalls? Has anybody by any chance implemented support for condor as a job scheduler? I think that's it for now. Thanks, Michael
galaxy • 771 views
ADD COMMENTlink modified 5.5 years ago by Jennifer Hillman Jackson25k • written 10.3 years ago by Michael Rusch80
gravatar for Nate Coraor
10.3 years ago by
Nate Coraor3.2k
United States
Nate Coraor3.2k wrote:
Hi Michael, Hopefully Ross' email helped - it's sometimes difficult for us to know how easily people adapt Galaxy for their own use, outside of our public sites. Our public sites are up to hundreds of thousands of datasets, ranging in sizes up to a few gigs. The caveats Ross listed apply, with respect to moving data around, so a good cluster infrastructure is important with larger datasets. We do indeed test all of our builds on SQLite (the default database, but not recommended outside of development), Postgres, and MySQL. No, only TORQUE/PBS and Sun Grid Engine. Galaxy's job runner is modular and can support any number of configurable job runners, though. --nate
ADD COMMENTlink written 10.3 years ago by Nate Coraor3.2k
gravatar for Jennifer Hillman Jackson
5.5 years ago by
United States
Jennifer Hillman Jackson25k wrote:
Hello Boaz, Galaxy itself will install and run in just a few minutes on most any laptop. And that install does include some very useful tools for basic usage (data manipulation, statistics, visualization). But you are correct - if you plan to do anything like NGS mapping followed by downstream analysis like ChIP-seq, RNA-seq, Variant Calling, etc. - then yes, you will need to give Galaxy access to sufficient resources to run those additional 3rd party tools. In general, those resources are the same as what you would need to provide anyway if you were running the same tools on the line-command. Using a smaller reference genome may make you analysis project more manageable in scale, but there are other factors such as the size of the sequence files you intend to use as inputs and how many are used at a time (some tools permit the use of multiple samples with multiple replicates). How much throughput you want to achieve is also a consideration. Giving this a try by testing may be the best way to know if it is going to work for your particular needs. Follow the set up instructions, then proceed to the advance configuration for setting up a production server. Install any necessary wrappers from the tool shed and dependencies for wrappers already in the distribution that you intend to use. And finally, install the data and indexes. All are documented and linked from the main 'Get Galaxy' wiki: Many biologists find that using a cloud Galaxy is another alternative that helps them achieve more throughput, while avoiding having to deal with certain administration tasks or buying hardware, since the cloud image has many tools and core databases/indexes pre-installed and the dedicated resources can scaled up or down as needed. You can also add your own genome. Keeping a permanent or semi-permanent data storage bucket, but turning Galaxy "off" when not needed, is one way to manage costs. You will also want to start following the mailing list, and begin asking local-install or cloud (if you go that route) questions there. Best, Jen Galaxy team -- Jennifer Hillman-Jackson Galaxy Support and Training
ADD COMMENTlink written 5.5 years ago by Jennifer Hillman Jackson25k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 171 users visited in the last hour