Question: Carrying a sample name through a workflow
gravatar for gkuffel22
3.7 years ago by
United States
gkuffel22170 wrote:

It seems this question has been asked so many times, but I don't see an answer that is valid. When creating a workflow for multiple samples to process as a batch....if the first step of your workflow is "Input_Dataset" which is automatically populated by the corresponding sample file when running the workflow for each sample why can't we carry that parameter (i.e. Sample 5) throughout the workflow so that each subsequent step in the workflow would read for example trim on sample 5, bwa on sample 5, flagstat on sample 5? Programmatically it doesn't seem that hard to code this functionality into the workflow user interface, I am I missing something?

workflow galaxy • 1.2k views
ADD COMMENTlink modified 3.6 years ago by Jennifer Hillman Jackson25k • written 3.7 years ago by gkuffel22170
gravatar for Jennifer Hillman Jackson
3.6 years ago by
United States
Jennifer Hillman Jackson25k wrote:


Dataset names can be inherited using the workflow variable #{dataset_name}. Where dataset_name is the specific name of the input file to the tool. These exact names are presented in the workflow editor when "Re-naming" a dataset (a post-job action option, found in the far right tool parameter/control panel).

You can also specify run-time parameters (one or more) using the variable ${whatever}.

A combination of both sometimes is best when developing a workflow.

The first variable is covered in the wiki below. Making use of the "basename" function (the default), where all content before the first dot "." is preserved, the original name can be propagated through the workflow. Simply make certain that downstream dataset names have just the sample name before the first dot, and all other content following it (ex: the tool/function used).

For the second variable, add it to the name string when re-naming a dataset in the workflow editor. It will appear as a variable in the upper right corner of the workflow canvas, then presented to the user at workflow execution run-time, where custom information can be added. This can be a date, a batch identifier, custom sample names (not inherited from inputs), and the like.

Hopefully this helps, Jen, Galaxy team

ADD COMMENTlink modified 3.6 years ago • written 3.6 years ago by Jennifer Hillman Jackson25k

Hi I have successfully used the advice above to name output files in a long workflows.  The trick is to maintain a 'continuous chain' of named datasets passing the input dataset name on to the output dataset.  Some tools do not allow you to do this and essentially break your naming chain an example is 'Merge BAM Files'.  So you have to choose carefully which tools you need to use. Thanks Guy





ADD REPLYlink written 3.6 years ago by Guy Reeves1.0k

Hi Jen,

Thanks for your help. I am now aware of how to utilize workflow parameters and variables as you have discussed and this functionality is certainly helpful. My problem now however is when using a specific tool such as Picard-FastQ to BAM in my workflow there is a parameter within the tool called sample_name and I cannot see a way to reference the input which has the sample name, so this field must be edited manually for each sample. I am dealing with 200 samples/fastq files. I tried entering in #{input_to_tool} in this field but the tool interpreted this an a string literal instead of pointing to the input file. My overall goal is to use GATK tools so it is important to maintain sample_info so that when all of the data is compiled into a single vcf file the data will correspond to the sample.

ADD REPLYlink written 3.6 years ago by gkuffel22170

I have also encountered this problem.

 it would be great that if the 'sample name' field was left blank in a tool that has this option,  then the normal naming protocol  #{input}  was used. This would enable easier use in workflows. Guy

ADD REPLYlink written 3.6 years ago by Guy Reeves1.0k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 183 users visited in the last hour