I was tasked to develop tools for the Galaxy server of a rice research. My task right now is to create a module to facilitate data retrieval from within Galaxy. The goal is to have a centralized location where researchers can query terms from within Galaxy and they will be provided with results from various sources that aren't natively in Galaxy. Target websites/sources are: NCBI GEO Datasets, RAP-DB (http://rapdb.dna.affrc.go.jp/), RGAP (http://rice.plantbiology.msu.edu/), QTARO (http://qtaro.abr.affrc.go.jp/). Moreover, I was tasked to limit searches to specific tags -- for example limiting searches within the species "Oryza sativa".
I'm completely new to the entire Galaxy and bioinformatics workflows, so I'm kind of underknowledged with all this. So I end up with a couple of questions:
- Is this achievable with any existing services?
- Do I have to create a completely separate web server that would act as a data hub for sources that doesn't support Galaxy (and manually crawl HTML and links if the site doesn't have an API/FTP access)?
- Any other way to put up those sources (RAP-DB, QTARO, etc) as a script on the Get Data section of Galaxy without creating a completely separate data hub or web crawler?
Thanks in advance.