Question: Galaxy Question
0
gravatar for D. A. Cowart
6.0 years ago by
D. A. Cowart30
D. A. Cowart30 wrote:
Hello, I would like to use Galaxy to divide a very large Ilumnia fasta file (~3GB) into separate fasta files. Is this possible on Galaxy? Here is an example of the reads: AAATAGAATATCACATTAAAATCACAAGCAGGACAGTGTGTGTAAAAGAAATCTTTTGTGAATTCAACGT TTATCAATTAGANNNNACGCCTACGTGTAG ATTTATCATAACAACTTAAATCAGTCAGTGGATTTCTGTCGGTCCGGTTAGCTCGGTTGGTAAAGGCGTT TGTTCGATCGTCTGTATTTTGCAATCGGGC I have tried the "Filter and Sort" option to try and select sequences just by a beginning sequence (ATGC, for example) to separate these sequences into a specific file, but I have been unsuccessful in this. Thank you, Dominique
galaxy • 768 views
ADD COMMENTlink modified 6.0 years ago by Jennifer Hillman Jackson25k • written 6.0 years ago by D. A. Cowart30
0
gravatar for Jennifer Hillman Jackson
6.0 years ago by
United States
Jennifer Hillman Jackson25k wrote:
Hello Dominique, Would the tool 'NGS: QC and manipulation -> Barcode Splitter' meet your needs? Please see the tool's help for usage. Another option is to first covert the file to tabular with 'FASTA manipulation ->FASTA-to-Tabular', then use the 'Filter and Sort -> Filter' tool. The match criteria would look something like: c2=='ATGC' . Once done, convert back to fasta with 'FASTA manipulation -> Tabular-to-FASTA'. Hopefully one of these methods will work out for you, Jen Galaxy team -- Jennifer Jackson http://galaxyproject.org
ADD COMMENTlink written 6.0 years ago by Jennifer Hillman Jackson25k
0
gravatar for Jennifer Hillman Jackson
6.0 years ago by
United States
Jennifer Hillman Jackson25k wrote:
Hi Dominique, This means that there were no results. My fault - the Filter tool is the wrong choice. The Select tool, as you were starting with, is better for this case. The issues you had originally were most likely with format or the regular expression. So, be sure to do the following this time: 1. Converting to tabular format (choose "1" for the identifier on this tool's form, to keep the output in a simple two column format) 2. Use a regular expression in the Select tool, "Matching", like this: \tATGC Where the "\t" indicates tab (the tab between the two columns), anchoring the matching text to the start of the sequence string. You could be more specific with the regular expression if you need to, following the guidelines on the tool help, but it probably isn't necessary if the rest of your sequences are formatted like the examples below. These sequences in your example are seperated by an empty line - but I am assuming that was just the way the data was pasted into the email. In a properly formatted fasta file, there should be no empty lines. To remove empty spaces in a fasta file, you can also use the Select tool (directly on the fasta format file), with a "NOT Matching" and this regular expression: ^$ Where ^ means the start of a line, and $ means the end. Together, they indicate a blank line. "NOT Matching" selects all lines that are not a blank line, e.g. have content. Please give this a try and let us know how it works. Jen Galaxy team -- Jennifer Jackson http://galaxyproject.org
ADD COMMENTlink written 6.0 years ago by Jennifer Hillman Jackson25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 173 users visited in the last hour