Question: Workflow Improvement Requests (Long)
0
gravatar for Assaf Gordon
10.0 years ago by
Assaf Gordon320
United States
Assaf Gordon320 wrote:
Dear all, Recently, users (of our local galaxy server) started using workflows, and are very pleased. However, as workflows get more complicated, it gets harder to track the input and output of the workflows. I'd like to share an example, to illustrate the problems that we encounter. The workflow (pictured in the attached 'workflow.jpg') takes 4 input datasets, and produces 4 output datasets. The first problem is that there's no way to differentiate between the input datasets (They appear simply as "Step 1: Input dataset", "Step 2: Input Dataset", etc). Since each dataset has a specific role, I've had to print the workflow and give the users instructions as to which dataset (in their history) goes into what dataset. (see attached 'crosstab_workflow_input_datasets.jpg'). The second problem is that whenever I change something in the workflow and save it - the order of the dataset change! So what was once dataset 1, can now be dataset 2,3 or 4. Users have no way of knowing this... (keen users might notice the the description of the first tool changed from "Output dataset 'output' from step 2" to "Output dataset' output' from step 4" - but this is very obscure...). The third problem is that once the workflow completes, the resulting dataset have cryptic names such as "Join two queries on Data 10 and Data 2". Since "Data 10" is "Awk on Data 8" and data-8 is "Generic Annotations on Data 7 and Data 1" and data-7 is "Intersect data 1 and data 6" - it gets a bit hard to know what's going on. (see attached 'crosstab_history.png'). For the meantime, I've simply gave written instructions on what each dataset means (see attached 'crosstab_workflow_dataset_explnanations.jpg). If I may suggest a feature - it would be great if I could name a dataset inside the workflow. Instead of naming it "Input dataset" I could give it a descriptive name, so even if the order of the input datasets changes, users will know which dataset goes into which input. Regarding the output dataset names, the 'label' option in the tools' XML is a good start, but still creates very long, hard-to-understand names. Another great feature would be the possibility to add an 'output label' for each step in the workflow. Regardless of the above, I'd like to say (once again) that Galaxy is a great tool, and workflows are really cool - we have several long workflows which do wonderful things. Thanks for reading so far, Gordon.
galaxy • 1.2k views
ADD COMMENTlink modified 10.0 years ago by James Taylor70 • written 10.0 years ago by Assaf Gordon320
0
gravatar for Gunnar Raetsch
10.0 years ago by
Gunnar Raetsch60 wrote:
Dear Assaf and everybody else, I can only reinforce what you said: Great work! ... and that I had similar problems. In particular, when working with workflows that have say 50 different steps, things can become very confusing. It would help, if one can define outputs of the workflow and hide all the steps in the history that are inside the workflow and not related to inputs and outputs. Another feature that I would find be very helpful in designing larger workflows would be if one could use workflows within a larger workflow. In my case I have set of tasks that have to be repeated using several different settings within a larger workflow. I realize that workflows are still in beta and that it might be too early to ask for such features... but it would be great to see them in beta soon. Thanks a lot for your efforts! Gunnar +-------------------------------------------------------------------+ Gunnar Rätsch http://www.fml.mpg.de/raetsch Friedrich Miescher Laboratory Gunnar.Raetsch@tuebingen.mpg.de Max Planck Society Tel: (+49) 7071 601 820 Spemannstraße 39, 72076 Tübingen, Germany Fax: (+49) 7071 601 801
ADD COMMENTlink written 10.0 years ago by Gunnar Raetsch60
0
gravatar for James Casbon
10.0 years ago by
James Casbon370
James Casbon370 wrote:
Hi Everyone, Slightly off-topic, but I see you have awk in your workflows. Awk could work on text, tabular, and other formats but I'd rather not define a new tool for each input type. Is there a way to define a tool which accepts more any type of input? It should ideally preserve the format in the output as well. thanks, James 2008/11/14 Assaf Gordon <gordon@cshl.edu>:
ADD COMMENTlink written 10.0 years ago by James Casbon370
James, The datatypes are a hierarchy, and tools will accept any type that is more specific than their defined input type. If you set the input type to "data" the tool will accept anything, if you set it to "text" it will accept any text format. For outputs, there is a special format "input" which copies the type of the input dataset (first input I believe, this needs to be enhanced to allow specifying a particular input). There is also the "metadata_source" attribute for copying the input metadata. This is how many of our tools that work on tabular data preserve the type and metadata of "interval" format files. -- jt
ADD REPLYlink written 10.0 years ago by James Taylor70
Great, thanks a lot. You're way ahead of me here ;) 2008/11/25 James Taylor <james.taylor@emory.edu>:
ADD REPLYlink written 10.0 years ago by James Casbon370
Indeed, the 'awk' tool accepts 'format="txt"' and therefore can handle almost any file in Galaxy. Regarding your other question ('user parameters ending up on command line'), here's my suggestion: In the <command> section, enclose the parameter in single-quotes (make sure it's single and not double): <command interpreter="sh">awk_wrapper.sh $input $output '$file_data'</command> In the program parameter (where users can enter whatever they want), add a validator to prevent single-quotes: <param name="file_data" type="text" area="true" size="5x35" label="AWK Program" help=""> <validator type="expression" message="Invalid Program!">value.find('\'')==-1</validator> </param> This way the parameters the user enter will always be single-quoted, and not parsed by the shell. -Gordon. James Casbon wrote, On 11/25/2008 12:31 PM:
ADD REPLYlink written 10.0 years ago by Assaf Gordon320
0
gravatar for Eric Schauberger
10.0 years ago by
Eric Schauberger10 wrote:
I second the request on sometype of labeling system for the workflow-- at least a numbering system. I made a workflow with many inputs, then when I tried it out I realized that the first input that I was joining with the second input was intermixed and unidentifable. Then I realized that the inputs are ordered in their creation order and not anytype of order how they are placed. Since I was making many, many, inputs I simply made a bunch of them at once and didn't keep track of their order or where I put them. Thank again for the sweet tool. Eric -- ________________________________________________________ Eric M Schauberger Physician Scientist Training Program (DO/PhD) Genetics Program Ewart Lab MSU College of Osteopathic Medicine (MSUCOM) Email: Schaube2@msu.edu Skype: Emschaub See my availability: http://www.timebridge.com/mytime/eschauberger __________________________________________________________
ADD COMMENTlink written 10.0 years ago by Eric Schauberger10
0
gravatar for James Taylor
10.0 years ago by
James Taylor70
James Taylor70 wrote:
Okay, input dataset labels are (finally ;) implemented in 15bf910890d5 which is r1647 in central and 1907 in the security development branch. In the editor form, you can provide a name for any input dataset, and it will be displayed as the row label in the run form. The support for this is pretty generic, and this changeset may help anyone wanting to try adding more parameters to a workflow module (like constraining input dataset type). -- jt
ADD COMMENTlink written 10.0 years ago by James Taylor70
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 171 users visited in the last hour