Question: Barcode Splitter And Clustering Analyses?
7.0 years ago by
Dear Galaxy we have a 454 metagenomic dataset. We have used barcode splitter to divide the dataset into it's constituent amplicons. We have also been using a clustering application (dnaclust) in Galaxy to subdivide the dataset by similarity. My question is; are there Galaxy tools to allow the combining, sorting and counting of these two outputs? For example, can each cluster - and then each sequence within that cluster - be given an identifier.... so that one can then split the output by barcode and summarise the data along the lines of amplicon/barcode X has X number of sequences within cluster 1, X number of sequences within cluster 2, ... etc? Am I making any sense? This is the sort of problem that sounds like it is solvable in Excel and, indeed, a UK colleague of mine has been doing just this. But is there a straightforward means to do so in Galaxy? It is not obvious to me in the Filtering or Sorting tools. best wishes Simon
galaxy • 806 views
7.0 years ago by
United States
Hello Simon, You are correct, Galaxy does not have a tool to do this exact operation in one step, but the "Join, Subtract and Group -> Group" tool may be able to generate the statistics you want from a tabular file containing the linked data. Such as: clusterID -> sequenceID -> barcodeID. Creating this file would require an uploaded clusterID -> sequenceID file and extracted sequenceID -> barcodeID data resulting from the "NGS: QC and manipulation -> Barcode splitter" tool. These two could be joined with "Join, Subtract and Group -> Join two Datasets" on the common identifier sequenceID. The processing would be multi-stepped, but once developed, the Galaxy steps could be saved in a workflow to run in the future. Best wishes for your research project, Jen Galaxy team -- Jennifer Jackson
