Question: Custom Data Retrieval within Galaxy
Hi everyone!


I was tasked to develop tools for the Galaxy server of a rice research. My task right now is to create a module to facilitate data retrieval from within Galaxy. The goal is to have a centralized location where researchers can query terms from within Galaxy and they will be provided with results from various sources that aren't natively in Galaxy. Target websites/sources are: NCBI GEO Datasets, RAP-DB (, RGAP (, QTARO ( Moreover, I was tasked to limit searches to specific tags -- for example limiting searches within the species "Oryza sativa".


I'm completely new to the entire Galaxy and bioinformatics workflows, so I'm kind of underknowledged with all this. So I end up with a couple of questions:

  • Is this achievable with any existing services?
  • Do I have to create a completely separate web server that would act as a data hub for sources that doesn't support Galaxy (and manually crawl HTML and links if the site doesn't have an API/FTP access)?
  • Any other way to put up those sources (RAP-DB, QTARO, etc) as a script on the Get Data section of Galaxy without creating a completely separate data hub or web crawler?

Thanks in advance.

As far as I know, creating a "Get Data" tool would be the way to target external sources with this sort of specificity. Another option is to pre-cache the data into a Library (or have certain users do this as they go along, but this means granting them admin). The final option is to have users share histories that contain commonly used data sources as "Public" - these would show up under Shared Data -> Published Histories.

Thanks, Jen, Galaxy team

