Carrying a sample name through a workflow

Question: Carrying a sample name through a workflow

3.7 years ago by

gkuffel22 • 170

United States

gkuffel22 • 170 wrote:

It seems this question has been asked so many times, but I don't see an answer that is valid. When creating a workflow for multiple samples to process as a batch....if the first step of your workflow is "Input_Dataset" which is automatically populated by the corresponding sample file when running the workflow for each sample why can't we carry that parameter (i.e. Sample 5) throughout the workflow so that each subsequent step in the workflow would read for example trim on sample 5, bwa on sample 5, flagstat on sample 5? Programmatically it doesn't seem that hard to code this functionality into the workflow user interface, I am I missing something?

workflow galaxy • 1.2k views

ADD COMMENT • link •

modified 3.6 years ago by Jennifer Hillman Jackson ♦ 25k • written 3.7 years ago by gkuffel22 • 170

3.6 years ago by

Jennifer Hillman Jackson ♦ 25k

United States

Jennifer Hillman Jackson ♦ 25k wrote:

Hello,

Dataset names can be inherited using the workflow variable #{dataset_name}. Where dataset_name is the specific name of the input file to the tool. These exact names are presented in the workflow editor when "Re-naming" a dataset (a post-job action option, found in the far right tool parameter/control panel).

You can also specify run-time parameters (one or more) using the variable ${whatever}.

A combination of both sometimes is best when developing a workflow.

The first variable is covered in the wiki below. Making use of the "basename" function (the default), where all content before the first dot "." is preserved, the original name can be propagated through the workflow. Simply make certain that downstream dataset names have just the sample name before the first dot, and all other content following it (ex: the tool/function used).
http://wiki.galaxyproject.org/Learn/AdvancedWorkflow/VariablesEdit

For the second variable, add it to the name string when re-naming a dataset in the workflow editor. It will appear as a variable in the upper right corner of the workflow canvas, then presented to the user at workflow execution run-time, where custom information can be added. This can be a date, a batch identifier, custom sample names (not inherited from inputs), and the like.

Hopefully this helps, Jen, Galaxy team

ADD COMMENT • link modified 3.6 years ago • written 3.6 years ago by Jennifer Hillman Jackson ♦ 25k

Hi I have successfully used the advice above to name output files in a long workflows. The trick is to maintain a 'continuous chain' of named datasets passing the input dataset name on to the output dataset. Some tools do not allow you to do this and essentially break your naming chain an example is 'Merge BAM Files'. So you have to choose carefully which tools you need to use. Thanks Guy

ADD REPLY • link written 3.6 years ago by Guy Reeves • 1.0k

Hi Jen,

Thanks for your help. I am now aware of how to utilize workflow parameters and variables as you have discussed and this functionality is certainly helpful. My problem now however is when using a specific tool such as Picard-FastQ to BAM in my workflow there is a parameter within the tool called sample_name and I cannot see a way to reference the input which has the sample name, so this field must be edited manually for each sample. I am dealing with 200 samples/fastq files. I tried entering in #{input_to_tool} in this field but the tool interpreted this an a string literal instead of pointing to the input file. My overall goal is to use GATK tools so it is important to maintain sample_info so that when all of the data is compiled into a single vcf file the data will correspond to the sample.

ADD REPLY • link written 3.6 years ago by gkuffel22 • 170

I have also encountered this problem.

it would be great that if the 'sample name' field was left blank in a tool that has this option, then the normal naming protocol #{input} was used. This would enable easier use in workflows. Guy

ADD REPLY • link written 3.6 years ago by Guy Reeves • 1.0k

Similar posts • Search »