Adding Custom Genome Builds To A Galaxy Instance

Question: Adding Custom Genome Builds To A Galaxy Instance

4.7 years ago by

United States

Hi Galaxy Users, I am trying to understand the process of adding new entries to the genome builds dropdown list and also ensuring these new genome entries are appropriately mapped to a reference sequence. I have been working through the code to work it out myself, but it would be helpful to get some advice from another user or possibly a developer who has experience with this process to avoid possible pitfalls. It would be a nice future enhancement to expose this functionality to administrators. Any advice would be appreciated! Thanks, -David

galaxy • 1.8k views

ADD COMMENT • link •

modified 4.7 years ago by Jennifer Hillman Jackson ♦ 25k • written 4.7 years ago by Norris, David • 30

4.7 years ago by

Jennifer Hillman Jackson ♦ 25k

United States

Jennifer Hillman Jackson ♦ 25k wrote:

Hi David, I think that you are asking about the locally cached reference genome builds - also sometimes called 'built-in'? If so, these wikis have instructions to help you create, organize, and obtain these data (if you want copies of what we host on Main). Reference genome builds can seem to have several "names" given to them - but if you look closely, you will note that each contains the most important identifier, the "dbkey". This short tag "dbkey" is what is seen as assigned for the "database" metadata attribute in the UI. Making sure that this is consistent and that location (e.g. ".loc" files) contain the correct key and are in the correct format will move you in the right direction. Watch for: tabs for white space between fields, no extra white space, no extra lines, dbkey value used are the same as in the reference and index files names, etc. - as described in headers of .loc files, the wikis below, or even follow what we have in ours from the rsync server. https://wiki.galaxyproject.org/Admin/DataIntegration - The most important parts of this wiki to note are the "builds.txt" file (this is the list of genomes in the pull down menu) and then the rsync server instructions (this is where you can obtain copies of the data used on Main) https://wiki.galaxyproject.org/Admin/NGS%20Local%20Setup - Organizing data and building indexes for various tools. Adding "Custom Reference Genomes" through the UI - as done by individual users - is a different process, but I don't think you are asking about that. The internals for that are not in a wiki, but the processing is automatic and much is done on the fly as tools execute, when a fasta file from the history is selected as the target reference build. (Same indexes, just not saved beyond the genome being loaded as when added using: User -> Custom Builds, good for using Trackster). Hopefully this will help you get started. Jen Galaxy team -- Jennifer Hillman-Jackson http://galaxyproject.org

ADD COMMENT • link written 4.7 years ago by Jennifer Hillman Jackson ♦ 25k

Similar posts • Search »