Question: Select lines in Galaxy (Filter-sort function)
0
gravatar for rafaela.michailidou
3.1 years ago by
United Kingdom
rafaela.michailidou0 wrote:

I try to select the lines whose 4th columns start with a hsa but I can't seem to get any results.  What is the expression I have to use for that?  I have been using c4='hsa' but it doesnt work

galaxy • 880 views
ADD COMMENTlink modified 3.1 years ago by Bjoern Gruening5.1k • written 3.1 years ago by rafaela.michailidou0
3
gravatar for Bjoern Gruening
3.1 years ago by
Bjoern Gruening5.1k
Germany
Bjoern Gruening5.1k wrote:

What comes always handy is the Python magic of "Filter data on any column using simple expressions". You can use this one as filter criteria: c4.startswith('hsa')

I have added this use case to the Galaxy-Tricks repository. Maybe you will find this useful:

https://github.com/bgruening/galaxy-tricks/commit/bc3e4fa2ab01a1468fbed0d3219d1573d366743c

Cheers,

Bjoern

ADD COMMENTlink written 3.1 years ago by Bjoern Gruening5.1k

Nice! Even I didn't know that py function would work. This could be considered as an example directly on the tool form (along with others that are known to function).

Love that Galaxy-Tricks repo :) Jen

ADD REPLYlink written 3.1 years ago by Jennifer Hillman Jackson25k

Glad you like it. Feel free to contribute :)

ADD REPLYlink written 3.1 years ago by Bjoern Gruening5.1k
1

On my "list"! :) 

ADD REPLYlink written 3.1 years ago by Jennifer Hillman Jackson25k
0
gravatar for Jennifer Hillman Jackson
3.1 years ago by
United States
Jennifer Hillman Jackson25k wrote:

Hello,

The contents of the data field must be exactly the contents between the two quotes in order for the Filter tool to isolate rows from the input.

The Select tool could be a better choice if the field contains other data. An expression like this will find data in the 4th column that start with hsa (but may have more content). It will be important to know the exact number of columns - otherwise "greedy" expression such as ".*" will be imprecise. 

^.*\t.*\t.*\thsa.*/t[add in more .*\t expressions until you reach the last column, then use this to capture the last column].*$

breakdown:

^ = start of line

.* = one or more characters, can capture nearly any content (greedy)

\t = must be a tab

hsa.* = specifies a value that starts with hsa. when bounded by tabs and all other fields in the row are bound by tabs, this can isolate and filter on the 4th column

.*$ = designates the last column, that can be of any content (without a tab), and when bounded by a starting tab and ending with a $, isolates the last field. ($ alone is always the end of the line)

More help on regular expressions is on the tool form and many places online. These can be simple or complicated and a few tests are sometimes needed to tune an expression to do exactly what you want with the given data. Often there are several ways to build an expression, this is a simple way to do the one you want.

Hope this helps!

Jen, Galaxy team

 

ADD COMMENTlink written 3.1 years ago by Jennifer Hillman Jackson25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 181 users visited in the last hour