Varying Number Of Output Files In Xml

Question: Varying Number Of Output Files In Xml

9.2 years ago by

Matthias Dodt • 110 wrote:

Hi galaxy-users! I wrote a tool that splits a FASTA file into n output files, each one of a predefined maximum size. The program could return the number of files or a list of filenames... Is it possible to define the number of outputs dynamically (nr of output files dependent on input-filesize)? Thanks! till now i experimented with: <tool id="seqan_splitter_1" name="FASTA splitter"> <description>Splits input files into pieces of desired size</description> <command interpreter="python"> ./tools/RNA-seq/fasta-splitter/fasta-splitter.py --maxsize $size 2> $log_report </command> <inputs> <param name="source" type="data" format="fasta" label="input fasta file"/> <param name="size" type="integer" label="Size in Megabyte of each output file" value="500" optional="false"/> <param name="files" type="hidden" value="10"/> </inputs> <outputs> #for $i < $files <data format="fasta" name="\$i" label="Splitted file"/> #end for <data format="text" name="log_report" label="Detailed log report from splitter"/> </outputs> </tool>

galaxy • 1.7k views

ADD COMMENT • link •

modified 7.1 years ago by fubar ♦ 1.1k • written 9.2 years ago by Matthias Dodt • 110

9.2 years ago by

Guruprasad Ananda • 230

Guruprasad Ananda • 230 wrote:

Dear Matthias, Yes, you can define number of outputs dynamically in Galaxy. For doing this, you'll have to declare one output dataset in your xml and pass its ID ($out_file.id) to your python script. Also, set force_history_refresh="True" in your tool tag in xml, like this: <tool id="split1" name="Split" force_history_refresh="True"> In your script, if your outputs are named in the following format, primary_associatedWithDatasetID_designation_visibility_extension (_DBKEY), all your datasets will show up in the history pane. associatedWithDatasetID is the $out_file.ID passed from xml, designation will be a unique identifier for each output (set in your script), visibility can be set to visible if you want the dataset visible in your history, or notvisible otherwise extension is the required format for your dataset (bed, tabular, fasta etc) DBKEY is optional, and can be set if required (e.g. hg18, mm9 etc) One of our tools "MAF to Interval converter" (tools/maf/ maf_to_interval.xml) already uses this feature. You can use it as a reference. Hope this answers your question. Please feel free to email us if you have any more queries. Guru Galaxy team. Regards, Guruprasad Ananda Graduate Student Bioinformatics and Genomics The Pennsylvania State University

ADD COMMENT • link written 9.2 years ago by Guruprasad Ananda • 230

Hi Guru! Thank you very much for the detailed reply! - now it works. Just one thing is strange: In the history appear the double amount of files, half of them has size zero and contains actually nothing. I wrote a program which splits a fasta/fastq file into n files. On the command line it works fine (the names of the output files can be specified directly via parameter). However there are always 2n files of the history - half of them empty. Any idea? the xml-file looks as follows: <tool id="seqan_splitter_1" name="FASTA splitter" force_history_refresh="True"> <description>Splits input files into pieces of desired size</description> <command> ./tools/RNA-seq/fasta-splitter/seqan_splitter --source $source --name-pattern primary_${resultset.id}_splitfile%_visible_fasta --target-dir $__new_file_path__/ --maxsize $size #if $format_input.type =="fasta" --format fasta #else --format fastq #end if ##2> $log_report </command> <inputs> <conditional name="format_input"> <param name="type" type="select" label="input file format" optional="false"> <option value="fasta" selected="true">FASTA</option> <option value="fastqsanger">FASTQSanger</option> <option value="fastqsolexa">FASTQSolexa</option> </param> <when value="fasta"> <param format="fasta" name="source" type="data" label="source file"/> </when> <when value="fastqsolexa"> <param format="fastqsolexa" name="source" type="data" label="source file"/> </when> <when value="fastqsanger"> <param format="fastqsanger" name="source" type="data" label="source file"/> </when> </conditional> <param name="size" type="integer" label="Size in Megabyte of each output file" value="500" optional="false"/> </inputs> <outputs> <data format="fasta" name="resultset" label="Splitted file"/> </outputs> <help> </help> </tool> Thanks again! greetings mat Guruprasad Ananda schrieb:

ADD REPLY • link written 9.2 years ago by Matthias Dodt • 110

Hi, how does/could this work in the workflow creation. How should/can I redirect all created outputs to a new tool which also accepts a dynamical number of inputs? Cheers, Jelle 2009/10/1 Matthias Dodt <matthias.dodt@mdc-berlin.de>

ADD REPLY • link written 9.2 years ago by Jelle Scholtalbers • 360

7.1 years ago by

fubar ♦ 1.1k

Australia

fubar ♦ 1.1k wrote:

Hi, Nicholas, You'll almost certainly want to write a wrapper to create the plink command line and run it - a wrapper script can construct a correct plink command line and then do all sorts of post-plink transformation on the outputs as needed - which in my experience it usually is. Most of the rgenetics tools do just that so looking at the source under tools/rgenetics may provide some prototypes you can change to suit your needs - eg rgQC.py and rgQC.xml Plink spews out all sorts of stuff so you may want to explore the Html datatype - see http://lists.bx.psu.edu/pipermail/galaxy- dev/2010-September/003311.html for a brief explanation. On Wed, Nov 9, 2011 at 1:31 AM, Nicholas Robinson -- Ross Lazarus MBBS MPH; Associate Professor, Harvard Medical School; Head, Medical Bioinformatics, BakerIDI; Tel: +61 385321444;

ADD COMMENT • link written 7.1 years ago by fubar ♦ 1.1k

Please log in to add an answer.

Similar posts • Search »