It seems this question has been asked so many times, but I don't see an answer that is valid. When creating a workflow for multiple samples to process as a batch....if the first step of your workflow is "Input_Dataset" which is automatically populated by the corresponding sample file when running the workflow for each sample why can't we carry that parameter (i.e. Sample 5) throughout the workflow so that each subsequent step in the workflow would read for example trim on sample 5, bwa on sample 5, flagstat on sample 5? Programmatically it doesn't seem that hard to code this functionality into the workflow user interface, I am I missing something?
Hello,
Dataset names can be inherited using the workflow variable #{dataset_name}. Where dataset_name is the specific name of the input file to the tool. These exact names are presented in the workflow editor when "Re-naming" a dataset (a post-job action option, found in the far right tool parameter/control panel).
You can also specify run-time parameters (one or more) using the variable ${whatever}.
A combination of both sometimes is best when developing a workflow.
The first variable is covered in the wiki below. Making use of the "basename" function (the default), where all content before the first dot "." is preserved, the original name can be propagated through the workflow. Simply make certain that downstream dataset names have just the sample name before the first dot, and all other content following it (ex: the tool/function used).
http://wiki.galaxyproject.org/Learn/AdvancedWorkflow/VariablesEdit
For the second variable, add it to the name string when re-naming a dataset in the workflow editor. It will appear as a variable in the upper right corner of the workflow canvas, then presented to the user at workflow execution run-time, where custom information can be added. This can be a date, a batch identifier, custom sample names (not inherited from inputs), and the like.
Hopefully this helps, Jen, Galaxy team
Hi I have successfully used the advice above to name output files in a long workflows. The trick is to maintain a 'continuous chain' of named datasets passing the input dataset name on to the output dataset. Some tools do not allow you to do this and essentially break your naming chain an example is 'Merge BAM Files'. So you have to choose carefully which tools you need to use. Thanks Guy
Hi Jen,
Thanks for your help. I am now aware of how to utilize workflow parameters and variables as you have discussed and this functionality is certainly helpful. My problem now however is when using a specific tool such as Picard-FastQ to BAM in my workflow there is a parameter within the tool called sample_name and I cannot see a way to reference the input which has the sample name, so this field must be edited manually for each sample. I am dealing with 200 samples/fastq files. I tried entering in #{input_to_tool} in this field but the tool interpreted this an a string literal instead of pointing to the input file. My overall goal is to use GATK tools so it is important to maintain sample_info so that when all of the data is compiled into a single vcf file the data will correspond to the sample.