Question: Integrate pipeline chain of apps with multiple output files
0
gravatar for David Managadze
3.3 years ago by
United States
David Managadze10 wrote:

I wrote a pipeline, a set of apps in Python. Every app outputs thousands of files. Every app will create a main work directory (aka $app_workdir, path provided in command line) and put all the final output data in $app_workdir/out/, all the logs in $app_workdir/log/, all temporaries in $app_workdir/tmp/ and so on.

Now I am trying to integrate this pipeline into Galaxy as a set of custom tools. I would prefer not to modify apps too much but rather create Galaxy custom tool XML files so that they can be chained with each other. The problem I can not solve is how to 'pack' a directory with thousands of files in it as *one* output and then how to provide this as one input to the next app.

I am trying the approaches mentioned here: https://wiki.galaxyproject.org/Admin/Tools/Multiple%20Output%20Files but can not make them work.

I guess the most appropriate approach for me should be the "Pass the application a specific output directory" but it does not seem to work for me. I made the first app work and create output, but the second app can not read the input. For the second app, Galaxy complains:

NotFound: cannot find 'files_path' while searching for 'input.files_path'

What am I doing wrong?

Here are the excerpts from my tool description XMLs:

# app #1
<command>
fasttree.py
--input "$input"
--workdir "$output.files_path"
</command>
<inputs>
    <param name="input" type="data" format="hg_mft" label="Alignment files manifest"/>
</inputs>

<outputs>
    <data name="output" format="html" label="Tree files"/>
</outputs>


# app #
<command>
analyze_trees.py
--input "$input.files_path/out/trees"
--workdir "$output.files_path"
</command>
<inputs>
    <param name="input" type="data" format="html" label="Tree files"/>
</inputs>
<outputs>
    <data name="output" format="html" label="Tree analysis"/>
</outputs>
ADD COMMENTlink modified 3.3 years ago • written 3.3 years ago by David Managadze10
0
gravatar for David Managadze
3.3 years ago by
United States
David Managadze10 wrote:

I fixed my problem but partially.

In the first app's XML, I set output like this:

<outputs>
  <data name="output" format="hg_mft" from_work_dir="out/trees.mft" label="FastTree Mft"/>
</outputs>

Here, "./out/trees.mft" is a "Manifest" file the first app's work directory. I came up with the format "hg_mft" which is just a txt file with the list of all files, one per line, in ./out/trees/* directory. These trees are the actual files I want the second app to work with. Providing Manifest file as the output made sure that the next app can get path to the firs app's data.

In the second app's XML, I have:

--input "$input.extra_files_path/out/trees"
...
<inputs>
  <param name="input" type="data" format="hg_mft" label="Tree files"/>
</inputs>

The second app correctly set the input to trees directory and worked.

Not solved: although there is the file ./out/trees.mft in the first app's work directory, Galaxy does not show its contents when I click "eye icon". Instead, the dataset item in the History sidebar says that it is 'empty' and shows a little part of tool's STDOUT.

QUESTION: How can I make Galaxy show the contents of the manifest file when I click the "eye icon"? It would be OK if I could see the app's log instead, but that did not work either. Am I doing something wrong? Is not  ' from_work_dir="out/trees.mft" ' a correct description?

ADD COMMENTlink written 3.3 years ago by David Managadze10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 175 users visited in the last hour