I try to select the lines whose 4th columns start with a hsa but I can't seem to get any results. What is the expression I have to use for that? I have been using c4='hsa' but it doesnt work
What comes always handy is the Python magic of "Filter data on any column using simple expressions". You can use this one as filter criteria: c4.startswith('hsa')
I have added this use case to the Galaxy-Tricks repository. Maybe you will find this useful:
https://github.com/bgruening/galaxy-tricks/commit/bc3e4fa2ab01a1468fbed0d3219d1573d366743c
Cheers,
Bjoern
Hello,
The contents of the data field must be exactly the contents between the two quotes in order for the Filter tool to isolate rows from the input.
The Select tool could be a better choice if the field contains other data. An expression like this will find data in the 4th column that start with hsa (but may have more content). It will be important to know the exact number of columns - otherwise "greedy" expression such as ".*" will be imprecise.
^.*\t.*\t.*\thsa.*/t[add in more .*\t expressions until you reach the last column, then use this to capture the last column].*$
breakdown:
^ = start of line
.* = one or more characters, can capture nearly any content (greedy)
\t = must be a tab
hsa.* = specifies a value that starts with hsa. when bounded by tabs and all other fields in the row are bound by tabs, this can isolate and filter on the 4th column
.*$ = designates the last column, that can be of any content (without a tab), and when bounded by a starting tab and ending with a $, isolates the last field. ($ alone is always the end of the line)
More help on regular expressions is on the tool form and many places online. These can be simple or complicated and a few tests are sometimes needed to tune an expression to do exactly what you want with the given data. Often there are several ways to build an expression, this is a simple way to do the one you want.
Hope this helps!
Jen, Galaxy team