Question: Varying Number Of Output Files In Xml
0
gravatar for Matthias Dodt
9.2 years ago by
Matthias Dodt110
Matthias Dodt110 wrote:
Hi galaxy-users! I wrote a tool that splits a FASTA file into n output files, each one of a predefined maximum size. The program could return the number of files or a list of filenames... Is it possible to define the number of outputs dynamically (nr of output files dependent on input-filesize)? Thanks! till now i experimented with: <tool id="seqan_splitter_1" name="FASTA splitter"> <description>Splits input files into pieces of desired size</description> <command interpreter="python"> ./tools/RNA-seq/fasta-splitter/fasta-splitter.py --maxsize $size 2> $log_report </command> <inputs> <param name="source" type="data" format="fasta" label="input fasta file"/> <param name="size" type="integer" label="Size in Megabyte of each output file" value="500" optional="false"/> <param name="files" type="hidden" value="10"/> </inputs> <outputs> #for $i < $files <data format="fasta" name="\$i" label="Splitted file"/> #end for <data format="text" name="log_report" label="Detailed log report from splitter"/> </outputs> </tool>
galaxy • 1.7k views
ADD COMMENTlink modified 7.1 years ago by fubar1.1k • written 9.2 years ago by Matthias Dodt110
0
gravatar for Guruprasad Ananda
9.2 years ago by
Guruprasad Ananda230 wrote:
Dear Matthias, Yes, you can define number of outputs dynamically in Galaxy. For doing this, you'll have to declare one output dataset in your xml and pass its ID ($out_file.id) to your python script. Also, set force_history_refresh="True" in your tool tag in xml, like this: <tool id="split1" name="Split" force_history_refresh="True"> In your script, if your outputs are named in the following format, primary_associatedWithDatasetID_designation_visibility_extension (_DBKEY), all your datasets will show up in the history pane. associatedWithDatasetID is the $out_file.ID passed from xml, designation will be a unique identifier for each output (set in your script), visibility can be set to visible if you want the dataset visible in your history, or notvisible otherwise extension is the required format for your dataset (bed, tabular, fasta etc) DBKEY is optional, and can be set if required (e.g. hg18, mm9 etc) One of our tools "MAF to Interval converter" (tools/maf/ maf_to_interval.xml) already uses this feature. You can use it as a reference. Hope this answers your question. Please feel free to email us if you have any more queries. Guru Galaxy team. Regards, Guruprasad Ananda Graduate Student Bioinformatics and Genomics The Pennsylvania State University
ADD COMMENTlink written 9.2 years ago by Guruprasad Ananda230
Hi Guru! Thank you very much for the detailed reply! - now it works. Just one thing is strange: In the history appear the double amount of files, half of them has size zero and contains actually nothing. I wrote a program which splits a fasta/fastq file into n files. On the command line it works fine (the names of the output files can be specified directly via parameter). However there are always 2n files of the history - half of them empty. Any idea? the xml-file looks as follows: <tool id="seqan_splitter_1" name="FASTA splitter" force_history_refresh="True"> <description>Splits input files into pieces of desired size</description> <command> ./tools/RNA-seq/fasta-splitter/seqan_splitter --source $source --name-pattern primary_${resultset.id}_splitfile%_visible_fasta --target-dir $__new_file_path__/ --maxsize $size #if $format_input.type =="fasta" --format fasta #else --format fastq #end if ##2> $log_report </command> <inputs> <conditional name="format_input"> <param name="type" type="select" label="input file format" optional="false"> <option value="fasta" selected="true">FASTA</option> <option value="fastqsanger">FASTQSanger</option> <option value="fastqsolexa">FASTQSolexa</option> </param> <when value="fasta"> <param format="fasta" name="source" type="data" label="source file"/> </when> <when value="fastqsolexa"> <param format="fastqsolexa" name="source" type="data" label="source file"/> </when> <when value="fastqsanger"> <param format="fastqsanger" name="source" type="data" label="source file"/> </when> </conditional> <param name="size" type="integer" label="Size in Megabyte of each output file" value="500" optional="false"/> </inputs> <outputs> <data format="fasta" name="resultset" label="Splitted file"/> </outputs> <help> </help> </tool> Thanks again! greetings mat Guruprasad Ananda schrieb:
ADD REPLYlink written 9.2 years ago by Matthias Dodt110
Hi, how does/could this work in the workflow creation. How should/can I redirect all created outputs to a new tool which also accepts a dynamical number of inputs? Cheers, Jelle 2009/10/1 Matthias Dodt <matthias.dodt@mdc-berlin.de>
ADD REPLYlink written 9.2 years ago by Jelle Scholtalbers360
0
gravatar for fubar
7.1 years ago by
fubar1.1k
Australia
fubar1.1k wrote:
Hi, Nicholas, You'll almost certainly want to write a wrapper to create the plink command line and run it - a wrapper script can construct a correct plink command line and then do all sorts of post-plink transformation on the outputs as needed - which in my experience it usually is. Most of the rgenetics tools do just that so looking at the source under tools/rgenetics may provide some prototypes you can change to suit your needs - eg rgQC.py and rgQC.xml Plink spews out all sorts of stuff so you may want to explore the Html datatype - see http://lists.bx.psu.edu/pipermail/galaxy- dev/2010-September/003311.html for a brief explanation. On Wed, Nov 9, 2011 at 1:31 AM, Nicholas Robinson -- Ross Lazarus MBBS MPH; Associate Professor, Harvard Medical School; Head, Medical Bioinformatics, BakerIDI; Tel: +61 385321444;
ADD COMMENTlink written 7.1 years ago by fubar1.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 172 users visited in the last hour