Question: Unable to convert FASTAQ file into NCBI standered FASTA file format
0
gravatar for meeran_micro
3.0 years ago by
Malaysia
meeran_micro0 wrote:

Hi I wanted align multiple alignment of several Salmonella Typhi isolates in bioedit software. I have FASTAQ Sanger file format sequences. In order to upload into Bioedit software I need FASTA file format in NCBI standered format. I used FASTAQ TO FASTA option in Galaxy. But output of single genome comes with multiple line alignment. I want whole genome sequence in single line format.

Can anyone please help me.

Meeran
 

alignment fasta fastaq • 918 views
ADD COMMENTlink modified 3.0 years ago • written 3.0 years ago by meeran_micro0
0
gravatar for jen
3.0 years ago by
jen0
United States
jen0 wrote:

Hello,

This is the format you want (strict fasta format), correct? http://www.ncbi.nlm.nih.gov/BLAST/blastcgihelp.shtml

The tool FASTQ to FASTA will convert to fasta format, but without the line wrapping. To do the wrapping, use the tool FASTA Width formatter. Using a width of 60 is common.

If there are other formatting issues, these can almost always be corrected within Galaxy. The troubleshooting section of the Custom genome wiki has several fixes for common problems. Just focus on Problem/Solution columns if not using the data with Galaxy: Learn/CustomGenomes#Troubleshooting

If there is some other problem or your question has been misunderstood, please share a sample of the input file (complete fastq sequences please) and a sample of how you think that data should be formatted. I haven't used Bioedit myself and from the website it looks to be software that is not supported anymore (or I would point you to their help).

This is not really a Galaxy question in the usual sense (formatting external data for use in Galaxy), but I think we can help you anyway! Jen, Galaxy team

ADD COMMENTlink written 3.0 years ago by jen0
0
gravatar for meeran_micro
3.0 years ago by
Malaysia
meeran_micro0 wrote:

Sample of my input FASTAQ file,

>20351#1/1
TAAAAGCNGGTTATGTTGTCGCTTTACGGTTTTCATTCAGGACGCGCTATGGGCAATAAGTATTCCGGCCTGCAAATTGGTATTCACTGGTTAGTCTTTT
>23523#1/1
TATCGCGNCGTTTTTACGCTGGCGTCACCGTCACCAATAAACCTTAGCGCGCTGGAGGAAATATCCCAGCGCGAAATTTATCGCCCCATAAACCGCGCCC

Sample of my output FASTA file should be formatted as follows,

TAAAAGCNGGTTATGTTGTCGCTTTACGGTTTTCATTCAGGACGCGCTATGGGCAATAAGTATTCCGGCCTGCAAATTGGTATTCACTGGTTAGTCTTTTTATCGCGNCGTTTTTACGCTGGCGTCACCGTCACCAATAAACCTTAGCGCGCTGGAGGAAATATCCCAGCGCGAAATTTATCGCCCCATAAACCGCGCCC

I wanted to remove all these description from entire file, >20351#1/1 and >23523#1/1

I wanna have all the sequences starts with symbol > then strain ID and all the sequences without any description in middle.

Can you please help me.

Meeran.

ADD COMMENTlink modified 3.0 years ago • written 3.0 years ago by meeran_micro0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 169 users visited in the last hour