Question: Local Galaxy Data Storage Plan
3.8 years ago by
United States
Our lab wants to setup a local production version of Galaxy so that a handful of people can access the system to use tools such as Bowtie, Tophat, and Cufflinks to process RNA-seq data.

The computer we will be using to run Galaxy has these specifications:

Dell Precision T7500
CPU: 2 x Intel Xeon X5690 @ 3.46 GHz
RAM: 12 x 8 GB = 96 GB
- connected to built-in SAS RAID controller: 4 disks x 1.5 TB each = 6 TB total
- internal disk: 1.5 TB
- external USB disks: 3 disks x 3 TB each = 9 TB total


1. What is the recommended hard disk configuration for this environment?

For example, our initial thoughts are to divide up the hard disks like so:

- RAID 1 mirror: 2 disks x 1.5 TB each = 1.5 TB total storage, for Operating System and Galaxy installation only
- RAID 1 mirror: 2 disks x 1.5 TB each = 1.5 TB total storage, for Galaxy database only
- one big partition: 3 USB disks x 3 TB each = 9 TB total, for user data only, backups not required
- internal disk: 1.5 TB, unused

Thank you :)

3.8 years ago by
United States
This prior post on the mailing list has a good summary (especially the first link which is a breakdown of what the current set of Galaxy public servers were using at that time).

Or review what they are currently using here (if the information was provided):

You can search this mailing list, along with other Admin resources, through this custom google search to find more recent posts on the same subject:

I expect that you probably already know (but for others reading), admin resources are linked from this wiki hub.

As you will find, so much is dependent on what analysis you plan to do, what tools are in use and the parameters used (working on bacterial genomes is much different than mammalian and even in those categories the resource demands can vary quite a bit), how many users you will be supporting, if you plan on adding in cluster nodes for job processing, how many reference genomes you plan on supporting and what indexes will be provided, etc. Prior Q&A searches will reveal this type of advice. Investing in running some analysis in a cloud environment can also reveal what resources are required for your specific use cases.

We may get more answers here on Galaxy Biostar, although the mailing list remains (for now!) the primary place to get feedback about local install questions. If you do decide to post there, including some details about these areas will help others to offer recommendations that are also running an environment with similar usage demands. Or, you can add in details here, but with the expectation that for the current time fewer folks from the development community are actively involved in reviewing/answering posts. Originally, Galaxy Biostar was intended for "end user" questions. That said, more cross-over is occurring daily and we plan on just allowing it all to evolve - the community will guide how the site is used as we move forward.

Hopefully the links & google search help with some examples, and from there you can decide if/how to proceed.

Jen, Galaxy team

