Question: Genbank Submission - How To Generate Fasta (Not Fastq) Files
0
gravatar for John David Osborne
7.5 years ago by
John David Osborne160 wrote:
I still haven't found an easy solution to this problem and I am afraid I'm going to have to write one my own - which makes little sense as I bet this has been solved thousands of times! Can anybody point me to a script/software to convert a samtools pileup file into a fasta consensus file? It would be nice to set coverage thresholds, etc... but I'll take anything I can work with. The best google could do for me was this: http://biostar.stackexchange.com/questions/1389/how-to-generate-a -consensus-fasta-sequence-from-sam-tools-pileup Not that helpful, -John P.S. If there is a better way of doing this (something other than samtools) I'm all ears.
samtools bam • 1.8k views
ADD COMMENTlink modified 7.4 years ago by Jennifer Hillman Jackson25k • written 7.5 years ago by John David Osborne160
0
gravatar for Jennifer Hillman Jackson
7.4 years ago by
United States
Jennifer Hillman Jackson25k wrote:
Hello John, One solution, if you want fasta sequence based on the reference genome (could be a native Galaxy genome, a custom genome in your history, or really any fasta file in your history as long as the mapped "chromosomes" names are identical), is to use the tool "NGS: SAM Tools -> Pileup-to-Interval". Then, to extract fasta sequence based on these coordinates use the tool "Fetch Sequences -> Extract Genomic DNA". This utilizes SAMTools, but is in the Galaxy public server and perhaps this makes it an acceptable option. If you are interested in examining the variation in your data vs the reference, please see the tools under "NGS: Indel Analysis". Combined with the tool "Genome Diversity -> Extract DNA flanking chosen SNPs" this can incorporate your SNPs into the background reference to produce novel fasta sequences. If still needed, moving from FASTQ to FASTA in Galaxy is very simple using the tool "NGS: QC and manipulation -> FASTQ to FASTA converter ". If command line is your preference, all of Galaxy's tools can be run there, too, using the source. http://getgalaxy.org I will post these options at BioStar at the question you quoted, for that user and others who may have a similar analysis project. Apologies for the delay in reply. Please let us know if we can help again, Best, Jen Galaxy team -- Jennifer Jackson http://usegalaxy.org/ http://galaxyproject.org/
ADD COMMENTlink written 7.4 years ago by Jennifer Hillman Jackson25k
Thanks for your reply Jen. I managed to use Pileup-to-Interval (on my new strain) and then Extract Genomic DNA but I'm not too sure what I got in terms of a FASTA file. What I am looking for is a consensus file for my sequence that can be submitted to Genbank, not a interval on a reference strain. Is that what this is returning? I haven't done any alignement yet. Also because it uses samtools, it doesn't incorporate indels found in my unknown strain...? You mention using Genome Diversity, but it's not clear to me how extracting the region flanking SNPs will get me the desired consensus sequence. Right now I am using John Nash's pileup2fasta (which works great, thanks John!) but I was hoping for something incorporated into galaxy. -John ________________________________________ To: John David Osborne Cc: galaxy-user@bx.psu.edu Subject: Re: [galaxy-user] GenBank Submission - How to Generate Fasta (not fastq) files Hello John, One solution, if you want fasta sequence based on the reference genome (could be a native Galaxy genome, a custom genome in your history, or really any fasta file in your history as long as the mapped "chromosomes" names are identical), is to use the tool "NGS: SAM Tools -> Pileup-to-Interval". Then, to extract fasta sequence based on these coordinates use the tool "Fetch Sequences -> Extract Genomic DNA". This utilizes SAMTools, but is in the Galaxy public server and perhaps this makes it an acceptable option. If you are interested in examining the variation in your data vs the reference, please see the tools under "NGS: Indel Analysis". Combined with the tool "Genome Diversity -> Extract DNA flanking chosen SNPs" this can incorporate your SNPs into the background reference to produce novel fasta sequences. If still needed, moving from FASTQ to FASTA in Galaxy is very simple using the tool "NGS: QC and manipulation -> FASTQ to FASTA converter ". If command line is your preference, all of Galaxy's tools can be run there, too, using the source. http://getgalaxy.org I will post these options at BioStar at the question you quoted, for that user and others who may have a similar analysis project. Apologies for the delay in reply. Please let us know if we can help again, Best, Jen Galaxy team -- Jennifer Jackson http://usegalaxy.org/ http://galaxyproject.org/
ADD REPLYlink written 7.4 years ago by John David Osborne160
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 169 users visited in the last hour