Question: Text Manipulation: Filter Out Duplicates (Uniq) From An Plain Text File ?
0
gravatar for Roman Valls
7.6 years ago by
Roman Valls130
Roman Valls130 wrote:
Hey galaxy users, Thats a fairly good question from one of my colleagues. I've looked through the menus (mainly "Text Manipulation" and "Filter and Sort"(Select)), googled (on the mailing list archives too), but couldn't find an answer: How should I remove duplicates on plain text files without resorting to: "echo file|sort|uniq" before uploading the file/text. or Putting a regexp together to replace the duplicate occurences as in: http://www.regular-expressions.info/duplicatelines.html I'm pretty sure I'm missing some really basic stuff here... is this basic operation something supposed to be done outside galaxy perhaps ? Thanks in advance !
galaxy • 1.8k views
ADD COMMENTlink modified 7.6 years ago by Rory Kirchner40 • written 7.6 years ago by Roman Valls130
0
gravatar for Rory Kirchner
7.6 years ago by
Rory Kirchner40 wrote:
Is there a reason why just using the command line tool isn't workable for you? Personally, I'm happy when I can just do something quick like that. Also you can simplify your command with sort filename | uniq. -rory
ADD COMMENTlink written 7.6 years ago by Rory Kirchner40
Well, having similarly basic tools (in Galaxy) that can be performed on the commandline, such as "sort" or "cut" I just wondered how come a "uniq" is not there on the tool panel in some form/name. Thanks for the feedback Rory !
ADD REPLYlink written 7.6 years ago by Roman Valls130
That's a timely question - I was also looking for something within Galaxy to take a text file and remove duplicate lines. Peter
ADD REPLYlink written 7.6 years ago by Peter Cock1.4k
Hi Peter and Roman, The "Count" tool under "Statistics" section provides uniq-like functionality. If you run this tool by selecting all columns under "Count occurrences of values in column(s)" field, your output will contain one line per record, with the 1st column containing the number of occurrences of each record. Hope this answers your question. Thanks for using Galaxy, Guru. -- Graduate student, Bioinformatics and Genomics Makova lab/Galaxy team Penn State University 505 Wartik lab University Park PA 16802 guru@psu.edu
ADD REPLYlink written 7.6 years ago by Guru Ananda60
Hi Paul, I'm happy to hear that so far you are happy with the data. We used the Illumina TruSeq targeting kit, I made the exome target regions available as both a .bed and .gff file on the FTP. Best regards Jonas Fra: galaxy-user-bounces@lists.bx.psu.edu [mailto:galaxy-user- bounces@lists.bx.psu.edu] Pĺ vegne af Guru Ananda Sendt: 06 May 2011 16:47 Til: Peter Cock Cc: galaxy-user@lists.bx.psu.edu Emne: Re: [galaxy-user] Text Manipulation: Filter out duplicates (uniq) from an plain text file ? Hi Peter and Roman, The "Count" tool under "Statistics" section provides uniq-like functionality. If you run this tool by selecting all columns under "Count occurrences of values in column(s)" field, your output will contain one line per record, with the 1st column containing the number of occurrences of each record. Hope this answers your question. Thanks for using Galaxy, Guru. That's a timely question - I was also looking for something within Galaxy to take a text file and remove duplicate lines. Peter ___________________________________________________________ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org<http: usegalaxy.org="">. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ -- Graduate student, Bioinformatics and Genomics Makova lab/Galaxy team Penn State University 505 Wartik lab University Park PA 16802 guru@psu.edu<mailto:guru@psu.edu>
ADD REPLYlink written 7.6 years ago by Jonas Grauholm10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 173 users visited in the last hour