Question: Uncollapse sequences in Galaxy
24 months ago by
elisabetta.cilli0 wrote:


I am looking for a tool to uncollapse a previously-collapsed fasta-file.

Thank you


Can you add an example?

Sorry for writing in the old post but I was searching the same thing. I have fasta files collapsed from FASTx and I want to uncollapse such format:




The copy number is after the _x or _

11 weeks ago by
United States
Jennifer Hillman Jackson23k wrote:


So you want to create a dataset that has each of the collapsed sequences put back into individual sequences, where the frequency of each is based on the count. There isn't a wrapped Galaxy tool that I know of to do this. And if you used the tool Collapse sequences (Galaxy Version 1.0.0) - FASTX-toolkit based, then the original sequence identifiers are no longer available.

A line-command script could be written to do this (and wrapped as a Galaxy tool). If you are interested in creating this, start here (use Planemo):

Next time, save back the original uncollapsed fasta dataset. It can be downloaded locally. Then after you confirm the download was successful, the dataset can be perm deleted from the history. This way, you can always upload it again if needed.

Thanks, Jen, Galaxy team

Thanks, I got an AWK line that can do it maybe I can make a tool.

PS Strange but these data were from BGI 5-6 years ago and they give us only collapsed fasta and fastq :(

Update: I just double checked and the FASTX authors release an uncollapse tool in 2009. It isn't covered in the online documentation except in the release notes and I didn't download the latest version to see if it there, but you could. It is also not wrapped in the Galaxy FASTX repo in the Tool Shed. 24-Nov-2009 - Version 0.0.11 New tools: fastx_uncollapser

However, if you know awk and have a script already, you could potentially use the Galaxy tool Text Manipulation > Text reformatting with awk.

