Question: "Fold" a column from a table?
17 months ago by
louisa.pyle10 wrote:

Is there a tool to "Fold" columns, essentially achieving the opposite of the Unfold tool?

I have the following (example):

ENSG00000107796 ACTA2 NP_001135417 HGNC:130 59

ENSG00000107796 ACTA2 NP_001604 HGNC:130 59

ENSG00000107796 ACTA2 NP_001307784 HGNC:130 59

ENSG00000107796 ACTA2 HGNC:130 59

And am seeking output: ENSG00000107796 ACTA2 NP_001135417, NP_001604, NP_001307784 HGNC:130 59


17 months ago by
Mo Heydarian830
United States
Mo Heydarian830 wrote:

Hello, You could use the "Group" or "Datamash" tools. You would group by the first column and "concatenate distinct" on the 2nd, 3rd, and 4th columns. In the case of your example, I would clean it up to remove the 4th line, as it has a different number of columns the the first three. I'd suggest using the "Select" tool to wrangle only those matching "NP_".

Hope this helps!


Mo Heydarian

Thank you! Ooph, agreed it's getting hung up on blank spots in the columns. If I remove all the rows that include blanks, I'll loose some data I need. I'm trying to fill the blanks now with '.' but having no luck. Any advice? :) I made sure my original file is in tabular format, and then using "replace" parts of text, replace (empty) with . (or . or '.'). Can't seem to fix it! Thx!

Try splitting the data into two files, the first with correctly formatted data and the second with rows that have blank entries. For the second file, you can add a placeholder using "Add column" and then rearrange the columns with the "Cut" tool.

Once the second file has the same format as the first you can combine them with the "Concatenate datasets" tool.

Is there any way to split the data in such a way, within Galaxy?

For the example data you posted, you could select lines matching (or not matching) "NP_" using the "Select" tool. This tool will search and select rows matching (or not) a string you provide.

