Question: Using Mysql Instead Of Sqlite
0
gravatar for Erick Antezana
9.4 years ago by
Erick Antezana570 wrote:
Hello, I would like to use MySQL instead of sqlite to store my data. I coudn't find on the Galaxy web site a HOWTO or some guidelines to do it. I only found some lines that might need to be changed/enabled in the universe_wsgi.ini file: #database_file = database/universe.sqlite database_connection = mysql:///galaxy #database_engine_option_echo = true #database_engine_option_echo_pool = true #database_engine_option_pool_size = 10 #database_engine_option_max_overflow = 20 Could you point out to some doc or briefly describe what I need to do in order to go for mysql? Are there any plans to support other DBMS's (like Oracle for instance)? thanks, Erick
galaxy • 1.1k views
ADD COMMENTlink modified 9.4 years ago by Greg Von Kuster840 • written 9.4 years ago by Erick Antezana570
0
gravatar for Greg Von Kuster
9.4 years ago by
Greg Von Kuster840 wrote:
Erick, To use a different database than the sqlite that come with the Galaxy distribution all that is needed is to change the config setting, prviding the URL that points to your mysql database. See the mysql documentation for the connection URL, as the URL differs depending upon whether you database is installed locally or not. The config setting is the "database_connection" setting, and could look something like this: database_connection = mysql:///greg_test?unix_socket=/var/run/mysqld/mysqld.sock Greg Von Kuster Galaxy Development Team
ADD COMMENTlink written 9.4 years ago by Greg Von Kuster840
Greg, I manage to set my connection string so that we could use a remote mysql server. Thanks. w.r.t. the datasets purging, I used the scripts to clean deleted libraries, folders, datasets, userless history ... I've seen that one must speficy the span of time in days. What about the data that was added mistakenly for instance today and that we want to immediately delete it? I tried to launch the script with "-d 0" but the data is still there... Am I missing something? I am no longer using the sqlite DB created in our first trials. I guess I can safely delete (from the command line) all the files under the directory database? have the purge_*.sh scripts tested with mysql? last question (already asked before): are there any plans to support Oracle? thanks, Erick 2009/7/22 Greg Von Kuster <ghv2@psu.edu>
ADD REPLYlink written 9.4 years ago by Erick Antezana570
Hi Erick, No, I don't think so. It's possible that your system clock is off from your database time. Is your database storing time as local time? The cleanup script uses the update_time for the objects being deleted. Maybe. Did you keep any data that refers to them in your tables when you migrated to mysql? If so, you'll need to keep them. Yes Not sure why it wouldn't already be supported, although we don't use it here. Just needs a different URL - sqlalchemy supports Oracle.
ADD REPLYlink written 9.4 years ago by Greg Von Kuster840
Greg, please see in-line: 2009/7/23 Greg Von Kuster <ghv2@psu.edu> both servers (mysql and the one where galaxy is running) have the same time. how can I see that? In which file can I find the SQL command that actually deletes and purges the data? no, I have no data referring to anything... I just deleted (to save space) all those files and I have no problems at all (so far ;-) ) good to know that, I will try to find some time to test it and let you know. cheers, Erick
ADD REPLYlink written 9.4 years ago by Erick Antezana570
We use Python's datetime.utcnow() method to set the update_time on table rows in the database, but that may differ from your system's local time ( not sure ). ~/scripts/cleanup_datasets/cleanup_datasets.py, the various methods that delete and purge histories, libraries and datasets. Raw sql commands are not here because we're using sqlalchemy ( in order to transparently support multiple databases ). Thanks Erick!
ADD REPLYlink written 9.4 years ago by Greg Von Kuster840
Hi Erick, Greg et alia, I've setup Galaxy with a MySQL DB too, but I cannot get rid off old stuff. According to the wiki, running the script with ... -1 or -3 or -5 should show me what the script would do with -2, -4 or -6. When I ran with -1 it told me: # 2009-07-29 14:03:22 - Handling stuff older than 1 days # Datasets will NOT be removed from disk. # The following datasets and associated userless histories have been deleted # Deleted 0 histories. Elapsed time: 0.21 That was I bit weird, because I know there should be stuff to delete. So I tried my luck with -2 to perform the actual cleanup and viola: # 2009-07-29 14:04:25 - Handling stuff older than 1 days # Datasets will NOT be removed from disk. # The following datasets and associated deleted histories have been purged 1 4 5 6 7 8 9 10 11 12 13 14 <..cut a lot of white space..> 15 16 # Purged 14 histories. Elapsed time: 1.17 Running with -3, -4 and -5 all gave me 0 in either purged data sets or folders, but I know there must be stuff associated with user accounts older than 1 day that should be purged... The -6 option does not seem to work at all as I got this error: "cleanup_datasets.py: error: no such option: -6". Do I miss something? In addition: 1. What if I ran the script without -r and later decide I want to delete the associated files anyway to free up some space? How do I then know what files to delete? 2. If I understand correctly, I should be able to remove associated data sets -r, but even when purging stuff the entries will still remain in the database... How do I really, really, Yes-Ok-I-accept-I- know-what-I'm-doing-Delete outdated stuff :) ? Cheers, Pi Biomolecular Mass Spectrometry and Proteomics Utrecht University Visiting address: H.R. Kruyt building room O607 Padualaan 8 3584 CH Utrecht The Netherlands Mail address: P.O. box 80.082 3508 TB Utrecht The Netherlands phone: +31 (0)6-143 66 783 email: pieter.neerincx@gmail.com skype: pieter.online
ADD REPLYlink written 9.3 years ago by Pieter Neerincx360
Hi Pi, The wiki for deleting datasets is out of date, and I will be updating it shortly. There is a collection of shell scripts included in the scripts/cleanup_datasets directory. In order to delete no longer needed datasets from disk, the scripts can be used in the following order (assuming you have not used library functions): delete_userless_histories.sh purge_histories.sh purge_datasets.sh I will send a message after the wiki has been updated. This is an excellent feature for us to add to the script. There are several database tables which Galaxy expects to exist (for Job reporting, etc.) and should not have entries deleted. Datasets are an example of this, when a Dataset is purged, the purged flag is set to True, but the entry is kept. Deleting entries from the dataset tables is not recommended. Thanks for using Galaxy, Dan
ADD REPLYlink written 9.3 years ago by Daniel Blankenberg ♦♦ 1.7k
Hi Dan, Ok, thanks for the info. I'll stick to the scripts above for the time being. That would be very useful. I uploaded a 6 GB file and modified it many times during some debugging. I already ran the python script without - r, so I'm stuck with many copies of this now worthless test data... Ok, I guess I'll have to live with that and I can imagine you might want to keep certain metadata / logs even when the job itself is marked EOL. In general though I prefer user interface elements that do what they claim to do; Hence I expect a delete button/script to delete something, but in this case it might save the day: I'll try to unpurge everything manually and re-run the script with -r.... Thanks, Pi Biomolecular Mass Spectrometry and Proteomics Utrecht University Visiting address: H.R. Kruyt building room O607 Padualaan 8 3584 CH Utrecht The Netherlands Mail address: P.O. box 80.082 3508 TB Utrecht The Netherlands phone: +31 (0)6-143 66 783 email: pieter.neerincx@gmail.com skype: pieter.online
ADD REPLYlink written 9.3 years ago by Pieter Neerincx360
Hi Dan, I have been facing the same problem as Pieter (as I reported some time ago) while trying to purge several NGS data files with which I was playing ... at that time I had used directly the scripts with different options (-d 1, -d 0, etc) and the deleted files were still there, I have just tried once again executing them in the order you have indicated but unfortunately the files are still there... I am using mysql to store my data. I have the same behaviour while using the default db (sqlite). cheers, Erick 2009/7/29 Daniel Blankenberg <dan@bx.psu.edu>
ADD REPLYlink written 9.3 years ago by Erick Antezana570
Hi Dan and Erick, Here's a follow up on my attempts to free up some disk space. I deleted all histories of all users (in a test installation :)). All I have left now is a few datasets in libraries. More than 10 days later I ran the scripts. At first this didn't help. Apparently the script does not check whether it is being executed with proper permissions to clean, because in the logs I found numerous entries like this one: # Error, file has already been removed: [Errno 13] Permission denied: 'database/files' At the end of the log it still claims: # Freed disk space: 609064379 But it did not clean a single byte. Re-running the script as root does not help, because the script did modify the database... So I manually modified the database by setting the purged column to 0 and the update_time to something at least 10 days ago. Then re-running the script did free up some disk space :). When I look in the galaxy/database/files/ directory I still see duplicated files, which are easily identified, because they have exactly the same size. These appear to be associated with libraries. As I was experimenting I uploaded and deleted the same files several times and eventually only kept a single copy of these files in my libraries... Maybe if I delete all libraries as well I'll be able to get rid of the binary trash. It appears to me that Galaxy can not clean datasets unless the history or library it was once assigned to is also deleted. Is this correct? If so, this is quite a pain, because it is only natural for users to experiment resulting first in several failures before resulting in the datasets they want to keep. As long as they keep the histories / libraries with the good data all the intermediate failures associated with those histories / libraries will claim disk space too :(... Cheers, Pi Biomolecular Mass Spectrometry and Proteomics Utrecht University Visiting address: H.R. Kruyt building room O607 Padualaan 8 3584 CH Utrecht The Netherlands Mail address: P.O. box 80.082 3508 TB Utrecht The Netherlands phone: +31 (0)6-143 66 783 email: pieter.neerincx@gmail.com skype: pieter.online
ADD REPLYlink written 9.3 years ago by Pieter Neerincx360
Hi Pi, An updated wiki on this topic is available at http://bitbucket.org/galaxy/galaxy- central/wiki/PurgeHistoriesAndDatasets . Executing the scripts using the -f flag will cause the script to attempt to re-purge datasets that are already marked as purged (changing the time delay would also be required as the table entries would have been marked as updated at the failed attempt.) Using the -6 flag / delete_datasets.sh script added in changeset 2551:5b405a43c406 will allow a base dataset to be marked as deleted without requiring a history/library/folder to be purged. This script could take considerable time, depending on the number of datasets in Galaxy. When history items are shared or library items are imported (to/from a history or between libraries), the base dataset (file) is shared, preventing duplication of file data on disk. Uploading a file several times creates a different base dataset (file) each time; each of the instances associated with each individually uploaded file will need to be deleted before a specific file will be purged from disk. Thanks for using Galaxy, Dan
ADD REPLYlink written 9.3 years ago by Daniel Blankenberg ♦♦ 1.7k
Hi Dan, Thanks! I'll update right away :)... Cheers, Pi Biomolecular Mass Spectrometry and Proteomics Utrecht University Visiting address: H.R. Kruyt building room O607 Padualaan 8 3584 CH Utrecht The Netherlands Mail address: P.O. box 80.082 3508 TB Utrecht The Netherlands phone: +31 (0)6-143 66 783 email: pieter.neerincx@gmail.com skype: pieter.online
ADD REPLYlink written 9.3 years ago by Pieter Neerincx360
Hi Galaxy developers, One more request... The cleaning of outdated stuff seems to work fine, but my logs are now filling up. The reason is that the cleanup scripts report the python path on STDERR like this: python path is: /mnt/npc01d01/prog/galaxy-2009-08-11-c021935a25ac/eggs/ py2.4-linux-x86_64-ucs4/Cheetah-1.0-py2.4-linux-x86_64.egg, /mnt/ npc01d01/prog/galaxy-2009-08-11-c021935a25ac/eggs/py2.4-linux-x86_64- ucs4/bx_python-0.5.0_dev_r4bf1f32e6b76-py2.4-linux-x86_64.egg, /mnt/ npc01d01/prog/galaxy-2009-08-11-c021935a25ac/eggs/py2.4-linux-x86_64- ucs4/python_lzo-1.08_static-py2.4-linux-x86_64.egg, /mnt/npc01d01/prog/ galaxy-2009-08-11-c021935a25ac/eggs/py2.4-linux-x86_64-ucs4, /mnt/ npc01d01/prog/galaxy-2009-08-11-c021935a25ac/eggs/py2.4-noplatform/ docutils-0.4-py2.4.egg, /mnt/npc01d01/prog/galaxy-2009-08-11- c021935a25ac/eggs/py2.4-noplatform/elementtree-1.2.6_20050316- py2.4.egg, /mnt/npc01d01/prog/galaxy-2009-08-11-c021935a25ac/eggs/ py2.4-noplatform/wchartype-0.1-py2.4.egg, /mnt/npc01d01/prog/ galaxy-2009-08-11-c021935a25ac/eggs/py2.4-noplatform/Paste-1.5.1- py2.4.egg, /mnt/npc01d01/prog/galaxy-2009-08-11-c021935a25ac/eggs/ py2.4-noplatform/Routes-1.6.3-py2.4.egg, /mnt/npc01d01/prog/ galaxy-2009-08-11-c021935a25ac/eggs/py2.4-noplatform/ flup-0.5.dev_r2311-py2.4.egg, /mnt/npc01d01/prog/galaxy-2009-08-11- c021935a25ac/eggs/py2.4-noplatform/WebOb-0.8.5-py2.4.egg, /mnt/ npc01d01/prog/galaxy-2009-08-11-c021935a25ac/eggs/py2.4-noplatform/ simplejson-1.5-py2.4.egg, /mnt/npc01d01/prog/galaxy-2009-08-11- c021935a25ac/eggs/py2.4-noplatform/WebHelpers-0.2-py2.4.egg, /mnt/ npc01d01/prog/galaxy-2009-08-11-c021935a25ac/eggs/py2.4-noplatform/ PasteDeploy-1.3.1-py2.4.egg, /mnt/npc01d01/prog/galaxy-2009-08-11- c021935a25ac/eggs/py2.4-noplatform/Mako-0.1.10-py2.4.egg, /mnt/ npc01d01/prog/galaxy-2009-08-11-c021935a25ac/eggs/py2.4-noplatform/ Beaker-0.5-py2.4.egg, /mnt/npc01d01/prog/galaxy-2009-08-11- c021935a25ac/eggs/py2.4-noplatform/MyghtyUtils-0.52-py2.4.egg, /mnt/ npc01d01/prog/galaxy-2009-08-11-c021935a25ac/eggs/py2.4-noplatform/ Babel-0.9.4-py2.4.egg, /mnt/npc01d01/prog/galaxy-2009-08-11- c021935a25ac/eggs/py2.4-noplatform/SQLAlchemy-0.4.7p1-py2.4.egg, /mnt/ npc01d01/prog/galaxy-2009-08-11-c021935a25ac/eggs/py2.4-noplatform, / mnt/npc01d01/prog/galaxy-2009-08-11-c021935a25ac/lib, /usr/lib64/ python24.zip, /usr/lib64/python2.4, /usr/lib64/python2.4/plat-linux2, / usr/lib64/python2.4/lib-tk, /usr/lib64/python2.4/lib-dynload, /usr/ lib64/python2.4/site-packages, /usr/lib64/python2.4/site-packages/ Numeric, /usr/lib64/python2.4/site-packages/gtk-2.0, /usr/lib/ python2.4/site-packages In addition when I run the scripts with cron I get e-mail, because cron thinks something went wrong due to the output on STDERR. I do appreciate getting e-mail in case something really went wrong, so redirecting STDERR to /dev/null isn't a nice solution, but I also don't need Galaxy to remember me daily of my python path :). I guess it's useful for debugging but can this be switched off by default and hidden behind a verbose/log switch or something similar? Cheers, Pi Biomolecular Mass Spectrometry and Proteomics Group Faculty of Science Utrecht University Visiting address: H.R. Kruyt building room O607 Padualaan 8 3584 CH Utrecht The Netherlands Mail address: P.O. box 80.082 3508 TB Utrecht The Netherlands phone: +31 (0)6-143 66 783 email: pieter.neerincx@gmail.com skype: pieter.online
ADD REPLYlink written 9.1 years ago by Pieter Neerincx360
Hi Pieter, This output has been removed in the latest commit. It should be available in the regular distribution repository in a few days. --nate
ADD REPLYlink written 9.1 years ago by Nate Coraor3.2k
Thanks! Pi mobile: +31 6 143 66 783 e-mail: pieter.neerincx@gmail.com skype: pieter.online
ADD REPLYlink written 9.1 years ago by Pieter Neerincx360
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 175 users visited in the last hour