Question: Tab database sort and retrieve unique records
gravatar for shtorkman
3.1 years ago by
United States
shtorkman0 wrote:

Hi Galaxy,

After retrieving taxonomic representation, I would like to select unique data set from the tabular file for further analysis based on the smallest e-value, highest % identity and longest read/contig. Is there a way to do such type of sort and retrieve queries in GALAXY?

I've been using galaxy for different metagenomics analyses, however this is the only one thing missing for me. I used to write scripts in FileMaker database when it was available.

Thank you

fetch taxonomy • 879 views
ADD COMMENTlink modified 2.8 years ago • written 3.1 years ago by shtorkman0
gravatar for Jennifer Hillman Jackson
3.1 years ago by
United States
Jennifer Hillman Jackson25k wrote:


Using a combination of the tools in Filter and Sort (Filter), Text manipulation (Compute), and Join, Subtract, Group (Group) might be able to do this work, but I am fairly certain that it will not retain all the other information contained in the output (which seems important). Joining back into the original file (by they key used to Group on) would recover multiple data rows - this is undesirable and nullifies the filtering steps. That said, some manipulations could be made to combine the final output into a final "key", and the same done to the original input, and then a Join performed. This will be several steps, but could be placed into a Workflow for re-use to behave "like a tool".

Because this appears to be a unique function and one that others would likely also use, I have opened a tool enhancement request. The dev-team may pick this up, or another group, or this is something that anyone with coding resource and time could contribute to Galaxy. How I outlined the requirements is flexible and built upon existing functions - other methods of implementing are certainly possible and should be considered. Please feel free to add in additional comments/requirements/ideas: 

Filemaker was a very useful tool for me as well many years ago, in particular when working with non-tech scientific users on analysis projects. Slow, but did the job. For the really large datasets most are working with now, having the best of those FM functions contained all in one place (Galaxy!) is better for the most obvious of reasons: no data transfers out then back in. FM was (is) based loosely on line-command utilities and SQL concepts (with a GUI front-end) and the "best of" should be relatively easy to translate into Galaxy tools using python or similar. 

Sorry we cannot help more, but thank you for the tool idea with a clear example usage! Jen, Galaxy team

ADD COMMENTlink modified 3.1 years ago • written 3.1 years ago by Jennifer Hillman Jackson25k

Update: Another tool that is available for cloud/local Galaxy use was just shared with me. Please see:

The IUC (according to Bjoern G.) will be publishing/replacing this tool with an updated version, so do not use it quite yet. But I wanted to let you know about tool options in the publishing stage that will likely solve this compute challenge.

ADD REPLYlink modified 3.1 years ago • written 3.1 years ago by Jennifer Hillman Jackson25k

Thank you so much! That is great =)

ADD REPLYlink written 3.1 years ago by shtorkman0

The IUC version of the tool is now in the Tool Shed. Search by "datamash". Thanks, Jen

ADD REPLYlink written 3.1 years ago by Jennifer Hillman Jackson25k
gravatar for shtorkman
2.8 years ago by
United States
shtorkman0 wrote:
Hi Jennifer, ­ I wanted to thank you for all you help a­nd introducing a new function, its aweso­me 😃 l do have a quick question. Not long ago Galaxy had a function calle­d "Fetch taxonomic representation". I us­ed that to retrieve taxonomy for Blast s­earches and it worked great. Today I tri­ed using"Convert Kraken" and the server ­does not suggest any taxonomy databases ­and I can not run data. The columns are ­selected correctly and the db id numbers­ seem to be correct. Is there anything y­ou can help me with? Thank you,­ Yury M Shtarkman, Ph.D.­
ADD COMMENTlink written 2.8 years ago by shtorkman0


Thanks goes to the IUC for the new tool, not me :)

The Kraken databases are in the process of being promoted from the Test server over to the Main server.

If your data is small, the jobs could be run on the Test server now:

The tools is new, so feedback is welcomed.

Next time let's start a new post for new questions, helps to keep things organized.

Best, Jen, Galaxy team

ADD REPLYlink modified 2.8 years ago • written 2.8 years ago by Jennifer Hillman Jackson25k

Update: Track the Kraken data update through this ticket:

ADD REPLYlink written 2.8 years ago by Jennifer Hillman Jackson25k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 178 users visited in the last hour