Tab database sort and retrieve unique records

Question: Tab database sort and retrieve unique records

3.1 years ago by

United States

shtorkman • 0 wrote:

Hi Galaxy,

After retrieving taxonomic representation, I would like to select unique data set from the tabular file for further analysis based on the smallest e-value, highest % identity and longest read/contig. Is there a way to do such type of sort and retrieve queries in GALAXY?

I've been using galaxy for different metagenomics analyses, however this is the only one thing missing for me. I used to write scripts in FileMaker database when it was available.

Thank you

fetch taxonomy • 879 views

ADD COMMENT • link •

modified 2.8 years ago • written 3.1 years ago by shtorkman • 0

3.1 years ago by

Jennifer Hillman Jackson ♦ 25k

United States

Jennifer Hillman Jackson ♦ 25k wrote:

Hello,

Using a combination of the tools in Filter and Sort (Filter), Text manipulation (Compute), and Join, Subtract, Group (Group) might be able to do this work, but I am fairly certain that it will not retain all the other information contained in the output (which seems important). Joining back into the original file (by they key used to Group on) would recover multiple data rows - this is undesirable and nullifies the filtering steps. That said, some manipulations could be made to combine the final output into a final "key", and the same done to the original input, and then a Join performed. This will be several steps, but could be placed into a Workflow for re-use to behave "like a tool".

Because this appears to be a unique function and one that others would likely also use, I have opened a tool enhancement request. The dev-team may pick this up, or another group, or this is something that anyone with coding resource and time could contribute to Galaxy. How I outlined the requirements is flexible and built upon existing functions - other methods of implementing are certainly possible and should be considered. Please feel free to add in additional comments/requirements/ideas: https://github.com/galaxyproject/tools-devteam/issues/260

Filemaker was a very useful tool for me as well many years ago, in particular when working with non-tech scientific users on analysis projects. Slow, but did the job. For the really large datasets most are working with now, having the best of those FM functions contained all in one place (Galaxy!) is better for the most obvious of reasons: no data transfers out then back in. FM was (is) based loosely on line-command utilities and SQL concepts (with a GUI front-end) and the "best of" should be relatively easy to translate into Galaxy tools using python or similar.

Sorry we cannot help more, but thank you for the tool idea with a clear example usage! Jen, Galaxy team

ADD COMMENT • link modified 3.1 years ago • written 3.1 years ago by Jennifer Hillman Jackson ♦ 25k

Update: Another tool that is available for cloud/local Galaxy use was just shared with me. Please see: https://toolshed.g2.bx.psu.edu/view/agordon/datamash_wrapper/687db8c37dcf

The IUC (according to Bjoern G.) will be publishing/replacing this tool with an updated version, so do not use it quite yet. But I wanted to let you know about tool options in the publishing stage that will likely solve this compute challenge.

ADD REPLY • link modified 3.1 years ago • written 3.1 years ago by Jennifer Hillman Jackson ♦ 25k

Thank you so much! That is great =)

ADD REPLY • link written 3.1 years ago by shtorkman • 0

The IUC version of the tool is now in the Tool Shed. Search by "datamash". Thanks, Jen

ADD REPLY • link written 3.1 years ago by Jennifer Hillman Jackson ♦ 25k

2.8 years ago by

shtorkman • 0

United States

shtorkman • 0 wrote:

Hi Jennifer, I wanted to thank you for all you help and introducing a new function, its awesome 😃 l do have a quick question. Not long ago Galaxy had a function called "Fetch taxonomic representation". I used that to retrieve taxonomy for Blast searches and it worked great. Today I tried using"Convert Kraken" and the server does not suggest any taxonomy databases and I can not run data. The columns are selected correctly and the db id numbers seem to be correct. Is there anything you can help me with? Thank you, Yury M Shtarkman, Ph.D.

ADD COMMENT • link written 2.8 years ago by shtorkman • 0

Hello,

Thanks goes to the IUC for the new tool, not me :)

The Kraken databases are in the process of being promoted from the Test server over to the Main server.

If your data is small, the jobs could be run on the Test server now: https://test.galaxyproject.org

The tools is new, so feedback is welcomed.

Next time let's start a new post for new questions, helps to keep things organized.

Best, Jen, Galaxy team

ADD REPLY • link modified 2.8 years ago • written 2.8 years ago by Jennifer Hillman Jackson ♦ 25k

Update: Track the Kraken data update through this ticket: https://github.com/galaxyproject/galaxy/issues/1679

ADD REPLY • link written 2.8 years ago by Jennifer Hillman Jackson ♦ 25k

Please log in to add an answer.

Similar posts • Search »