Question: Dealing with a deterministic amount of output files in tool wrapper
0
gravatar for cdean11
2.6 years ago by
cdean1120
cdean1120 wrote:

I have a tool that creates a deterministic amount of output files based on the choice of input parameters. The tool outputs a varied number of compressed (gzip) fastq files. When the tool completes, the output files are created, but I am unable to load the data into my history pane. The output files are written in the format output.fastq.gz.

I used the code from the galaxy central repository to try and discover the output

<data name="output" format="fastq" >
    <discover_datasets pattern="(?P&lt;designation&gt;.+)\.fastq\.gz" ext="toolshed.gz" visible="true" assign_primary_output="true" />
</data>

I've tried numerous other approaches and parameters, but have not had any success. I'm not completely sure that the toolshed.gz extension is correct, but that was the closest equivalent to gzip I could find in the available datatypes section of the datatypes_conf.xml.sample file. Any advice?

software error galaxy • 820 views
ADD COMMENTlink modified 2.6 years ago by Dave B.410 • written 2.6 years ago by cdean1120
1
gravatar for Dave B.
2.6 years ago by
Dave B.410
United States
Dave B.410 wrote:

Good morning,

The syntax of the discover_datasets tag is very nearly correct, but the ext attribute should be set to "fastq", since that is what the gzip contains.

I would also recommend decompressing the files before they reach the history, so that galaxy can generate accurate metadata for the output data.

ADD COMMENTlink written 2.6 years ago by Dave B.410

Thanks for the response Dave, but unfortunately, Galaxy is still not able to load any output files into my history pane. I've tried refreshing the pane as well, but still no luck.

ADD REPLYlink modified 2.6 years ago • written 2.6 years ago by cdean1120

That is odd. Have you verified that the files are actually in the job's working directory? If not, to do that, uncomment and set the following option in your galaxy.ini file:

cleanup_job = never

When the tool completes, you can extract the job's working directory from the log and investigate which files were actually created. If it turns out that the files were created in a subdirectory, the solution would be to add the attribute directory="{directory name}" to the discover_datasets tag.

ADD REPLYlink written 2.6 years ago by Dave B.410

The files are being stored in the correct directory. What's weird is that when I run the example code found in the galaxy central repository; everything works fine and I'm able to retrieve multiple txt or tabular files.

In my job's directory I can see my output files being stored as something like dataset_x.dat.foo.fastq.gz. It may be worth gunzipping each file to see if that works, but I'm not sure how to incorporate that into my code. Ideally, I'd want to execute the following line of code in the command attribute of my xml:

gunzip "${output}/*.fastq.gz"

but that doesn't seem to be working either. Any suggestions would be helpful. Thanks again.

ADD REPLYlink modified 2.6 years ago • written 2.6 years ago by cdean1120

Is there any way you can share your tool xml with me?

ADD REPLYlink written 2.6 years ago by Dave B.410
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 167 users visited in the last hour