I have a table in Galaxy with 19,132 rows. I can remove duplicates using group from join, subtract and group and obtain 6,934 entries, but I loose the information from all of the other 23 columns. How can I remove all of the duplicate rows while keeping all of the information of my 23 columns?
There is no simple tool to perform a "sort unique" on a tabular dataset. Although this would be helpful. Let me ask around and open a ticket if there is interest (I'll post it back here).
Meanwhile, try the tool DataMash. It is similar to Group, but the columns to retain can be specified.
Thanks, Jen, Galaxy team