Question: Barcode Splitter And Clustering Analyses?
gravatar for Simon Bulman
7.0 years ago by
Simon Bulman10
Simon Bulman10 wrote:
Dear Galaxy we have a 454 metagenomic dataset. We have used barcode splitter to divide the dataset into it's constituent amplicons. We have also been using a clustering application (dnaclust) in Galaxy to subdivide the dataset by similarity. My question is; are there Galaxy tools to allow the combining, sorting and counting of these two outputs? For example, can each cluster - and then each sequence within that cluster - be given an identifier.... so that one can then split the output by barcode and summarise the data along the lines of amplicon/barcode X has X number of sequences within cluster 1, X number of sequences within cluster 2, ... etc? Am I making any sense? This is the sort of problem that sounds like it is solvable in Excel and, indeed, a UK colleague of mine has been doing just this. But is there a straightforward means to do so in Galaxy? It is not obvious to me in the Filtering or Sorting tools. best wishes Simon The contents of this e-mail are confidential and may be subject to legal privilege. If you are not the intended recipient you must not use, disseminate, distribute or reproduce all or any part of this e-mail or attachments. If you have received this e-mail in error, please notify the sender and delete all material pertaining to this e-mail. Any opinion or views expressed in this e-mail are those of the individual sender and may not represent those of The New Zealand Institute for Plant and Food Research Limited.
galaxy • 806 views
ADD COMMENTlink modified 7.0 years ago by Jennifer Hillman Jackson25k • written 7.0 years ago by Simon Bulman10
gravatar for Jennifer Hillman Jackson
7.0 years ago by
United States
Jennifer Hillman Jackson25k wrote:
Hello Simon, You are correct, Galaxy does not have a tool to do this exact operation in one step, but the "Join, Subtract and Group -> Group" tool may be able to generate the statistics you want from a tabular file containing the linked data. Such as: clusterID -> sequenceID -> barcodeID. Creating this file would require an uploaded clusterID -> sequenceID file and extracted sequenceID -> barcodeID data resulting from the "NGS: QC and manipulation -> Barcode splitter" tool. These two could be joined with "Join, Subtract and Group -> Join two Datasets" on the common identifier sequenceID. The processing would be multi-stepped, but once developed, the Galaxy steps could be saved in a workflow to run in the future. Best wishes for your research project, Jen Galaxy team -- Jennifer Jackson
ADD COMMENTlink written 7.0 years ago by Jennifer Hillman Jackson25k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 180 users visited in the last hour