Question: Integrate pipeline (set of custom tools) into Galaxy
1
gravatar for David Managadze
3.3 years ago by
United States
David Managadze10 wrote:

I have created a set of apps for running a pipeline. The individual apps can be started from the command line and they have unified interface and parameters. For instance all of them have parameters:

--input    # to tell that the input is a directory
--workdir  # working directory where the app must write all its output
--conf     # configuration file
--cache    # global sequence cache

Every app then creates $workdir, several directories in it and saves files there. 

In the beginning, the first apps create configuration files, sequence cache, etc. These data need to be available to every downstream app, like a global state.

How can I implement this kind of behavior in Galaxy? How can I make all these data available to every app? How can I code this in custom tools' config XML? Or shall I first create apps and then it is possible to do this in Workflows?

ADD COMMENTlink modified 3.3 years ago by frederik.coppens40 • written 3.3 years ago by David Managadze10
3
gravatar for Martin Čech
3.3 years ago by
Martin Čech ♦♦ 4.9k
United States
Martin Čech ♦♦ 4.9k wrote:

I re-opened this question as I feel it is worth discussing.

In Galaxy world we have tools - atomic elements with inputs and outputs and if you need to chain more tools into pipeline we have workflows that offer that.

As I understand it in your context you have multiple 'apps' that share configuration. Can the set of your apps be approached as a one Galaxy Tool? 

xref to related question: Integrate pipeline chain of apps with multiple output files

ADD COMMENTlink modified 3.3 years ago • written 3.3 years ago by Martin Čech ♦♦ 4.9k
0
gravatar for David Managadze
3.3 years ago by
United States
David Managadze10 wrote:

Thanks Martin!

RE: Can the set of your apps be approached as a one Galaxy Tool? 
I am not sure. Some of them can be used in different contexts, and the reason that I am trying to integrate them into Galaxy is that I might want to connect them in different ways.

I thought about something like this:

I see that <param> has attributes type="hidden" type="hidden_data". I am not sure how to use them yet but I think if I create hidden output (read: <param>) with all the build-related metadata and pass it to the next application in the pipeline, it could work. I am not sure what this metadata structure should be though...

Perhaps I will create central "Builder" app that keeps all the metadata info about the build and its shared resources: organism names, taxids, paths to databases, sequence caches, etc. Every tool should get one (hidden or otherwise) input that would make it possible to connect to Builder's API and determine everything it needs. I am not sure what exactly this will yet.

This might also help in determining output paths outside of Galaxy. I am not very fond of having all the builds I will ever run together in the directories like galaxy/database/files/000/dataset_*_files :-)

Any opinions about the architecture of something like this?

ADD COMMENTlink written 3.3 years ago by David Managadze10
0
gravatar for frederik.coppens
3.3 years ago by
VIB, Gent, Belgium
frederik.coppens40 wrote:

We have had similar issues to implement tools in Galaxy (e.g. shoremap). Unfortunately, we haven't solved this yet.

I think that tools with fixed outputs (so the architecture is predictable) should be feasible (but might require a lot of output fields with 'from_work_dir' set). Is this the case? But then the next step still doesn't have the dir structure it needs, but you maybe could first recreate this in a wrapper script?

For tools where the number of output files is not known beforehand, I'm not aware of any solution. 

 

You also mention 'builds', not sure what you mean, do you want to reuse these?

ADD COMMENTlink written 3.3 years ago by frederik.coppens40

Frederik, I made some progress passing a directory path to the next app. See Integrate pipeline chain of apps with multiple output files
There is an unsolved problem too in that post. I would really appreciate if you could give me some hints about 'from_work_dir' if you made them work.
Thanks.

 

ADD REPLYlink written 3.3 years ago by David Managadze10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 181 users visited in the last hour