Question: Uncollapse sequences in Galaxy
0
gravatar for elisabetta.cilli
2.4 years ago by
Italy
elisabetta.cilli0 wrote:

Hello

I am looking for a tool to uncollapse a previously-collapsed fasta-file.

Thank you

Elisabetta

ADD COMMENTlink modified 7 months ago by Jennifer Hillman Jackson25k • written 2.4 years ago by elisabetta.cilli0
1

Can you add an example?

ADD REPLYlink written 2.4 years ago by Bjoern Gruening5.0k

Sorry for writing in the old post but I was searching the same thing. I have fasta files collapsed from FASTx and I want to uncollapse such format:

>name_x99999

or

>name_9999

The copy number is after the _x or _

ADD REPLYlink modified 7 months ago • written 7 months ago by vebaev130
0
gravatar for Jennifer Hillman Jackson
7 months ago by
United States
Jennifer Hillman Jackson25k wrote:

Hello,

So you want to create a dataset that has each of the collapsed sequences put back into individual sequences, where the frequency of each is based on the count. There isn't a wrapped Galaxy tool that I know of to do this. And if you used the tool Collapse sequences (Galaxy Version 1.0.0) - FASTX-toolkit based, then the original sequence identifiers are no longer available.

A line-command script could be written to do this (and wrapped as a Galaxy tool). If you are interested in creating this, start here (use Planemo): https://galaxyproject.org/tools/

Next time, save back the original uncollapsed fasta dataset. It can be downloaded locally. Then after you confirm the download was successful, the dataset can be perm deleted from the history. This way, you can always upload it again if needed.

Thanks, Jen, Galaxy team

ADD COMMENTlink modified 7 months ago • written 7 months ago by Jennifer Hillman Jackson25k

Thanks, I got an AWK line that can do it maybe I can make a tool.

PS Strange but these data were from BGI 5-6 years ago and they give us only collapsed fasta and fastq :(

ADD REPLYlink written 7 months ago by vebaev130

Update: I just double checked and the FASTX authors release an uncollapse tool in 2009. It isn't covered in the online documentation except in the release notes and I didn't download the latest version to see if it there, but you could. It is also not wrapped in the Galaxy FASTX repo in the Tool Shed.

http://hannonlab.cshl.edu/fastx_toolkit/ 24-Nov-2009 - Version 0.0.11 New tools: fastx_uncollapser

However, if you know awk and have a script already, you could potentially use the Galaxy tool Text Manipulation > Text reformatting with awk.

ADD REPLYlink modified 7 months ago • written 7 months ago by Jennifer Hillman Jackson25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 107 users visited in the last hour