Suggestion: New Tag In Tool'S Xml File (Long)

Question: Suggestion: New Tag In Tool'S Xml File (Long)

10.0 years ago by

United States

Assaf Gordon • 320 wrote:

Hello, As recently discussed on this mailing list, With long workflows it becomes somewhat difficult to make sense of the datasets names. The 'label' option in the <output> tag is good start, but there's still room for improvement. I propose a new 'tag' element which might help a little bit. Here's an example: I have five tools in a simple workflow: Starting with a FASTQ file, FASTQ-to-FASTA FASTA-clipper, FASTA-collapser, FASTA-filter-by-length, FASTA-low-complexity-repeats-remover. Each tool has an <output> option with a label. The label takes the description of the tool, and the name of the input. Example from the low-complexity-repeats-remove tool: <output> <data format="input" name="output" label="$input.name (without low-complexity repeats)" metadata_source="input"/> </output> Similarly, most of the other tools have a 'label' option in the XML file. The problem is that after running the workflow, the name of each dataset becomes longer and longer, and more cumbersome. Even if I start with a very short name for the initial FASTQ file (e.g. 'AG14'), The other datasets are named like: AG14 (Fasta), AG14 (Fasta) clipped, AG14 (Fasta) clipped collapsed, Filter sequences by length on data 5 Filter sequences by length on data 5 (without low complexity repeats) (The filter-sequences-by-length tool doesn't have a 'label' option with makes the output even worse.) My suggestion: I would like each data set to have a 'tag' - a short name which is carried over from one dataset to the next, without taking the entire dataset's name. This 'tag' is usually just the name of the initial dataset. going back to the previous example, if I use the initial library name as the tag (=AG14), the output of the workflow would look like: AG14 AG14 (Fasta) AG14 (Clipped) AG14 (Collapsed) AG14 (Filtered Sequences) AG14 (without low complexity repeats) I tried to add this functionality to the galaxy source code. I'm aware that the Right Thing to do is probably to add a new column to the relevant database table, and setup the tools to automatically take the tag from one dataset to the next. However, as an intermediate solution, my hack works nicely (I'm attaching pictures of 'before' and 'after :-) ). With this hack, one can use the following 'label' in the XML <output> tag: <outputs> <data format="input" name="output" label="$input.tag clipped" metadata_source="input"/> </outputs> The "$input.tag" extract the tag from the input's name, and puts in the square brackets, so that the next tool will also be able to extract the tag. Here's the added code ( ./lib/galaxy/model/__init__.py ): ... class HistoryDatasetAssociation( object ): ... @property def tag ( self ): #Hack by gordon: #add a '.tag' attribute tag_match = re.search( '\[([^[]+)\]', self.name); if tag_match: tag = "[" + tag_match.group(1) + "]" ; else: tag = "[" + self.name + "]" ; return tag Thanks for reading so far, Any comments and welcomed, Gordon.

galaxy • 1.1k views

ADD COMMENT • link •

written 10.0 years ago by Assaf Gordon • 320

Similar posts • Search »