Question: Finding Open Reading Frames And Corresponding Genes
0
gravatar for Joanne Rampersad
7.4 years ago by
Joanne Rampersad30 wrote:
Hi I am sequencing a bacterial genome and have assembled my Illumina reads (40 bp single) using bowtie with a reference genome. This generated a sam file. I would like to obtain a listing of the open reading frames from the bacterial genome and the corresponding genes that they are most similar to. Can you please give the tools/steps necessary to do this? many thanks Jo
alignment bowtie • 1.4k views
ADD COMMENTlink modified 7.4 years ago by Jennifer Hillman Jackson25k • written 7.4 years ago by Joanne Rampersad30
0
gravatar for Jennifer Hillman Jackson
7.4 years ago by
United States
Jennifer Hillman Jackson25k wrote:
Hello, Making the assumption that your data is DNA (and not RNA), the tools under "NGS: SAM Tools" and "Operate on Genomic Intervals" can generate coordinates of the mapped reads which then can be correlated with known genes/ORFs from your bacterial genome (or related genomes, if you can obtain those those mapped to the same reference genome to use as predictions). General analysis path: Starting with your SAM file, use these tools first to obtain an interval file representing your read coverage: 1 - SAM-to-BAM 2 - Generate pileup 3 - Pileup-to-Interval Next, import a reference known gene set. Sources may include UCSC, NCBI, or other. Download from that source (if not directly available via a "Get Data" source) and upload the file into Galaxy using FTP ("Get Data -> Upload"). If this data is in GTF/T format, you can convert it to Interval using "Convert Formats -> GFF-to-BED" (BED is a stricter form of Interval, use the pencil icon in the datasets box to Edit Attributes to change data type to Interval). Then if using output from "Pileup-to-Interval" and a reference known gene/transcript dataset mapped to the same reference genome, use the tools in "Operate on Genomic Intervals" to perform comparisons based on genomic coordinates. Each tool has a description on the main tool form, but there are also screencasts explaining the functions here under "3. Interval Operation Tutorial" http://wiki.g2.bx.psu.edu/Learn/Screencasts Also see: "Regional Variation -> Feature coverage" for localized comparisons Once an intersection of coordinates is complete, you may need to use the tools in "Text Manipulation", "Filter and Sort", or "Join, Subtract and Group" to merge in gene identifiers. Exactly what order to use these tools greatly depends on the input reference gene/transcript dataset formats. If you are doing transcript predictions, the tool "EMBOSS -> getorf" Finds and extracts open reading frames (ORFs)" may be helpful. This tool requires sequence as input. Once predicted transcript coordinates are obtained, extract sequence from the reference genome using "Fetch Sequences -> Extract Genomic DNA" to use as input. Hopefully this helps to get you started. Please let us know if we can help again, Best, Jen Galaxy team -- Jennifer Jackson http://usegalaxy.org/ http://galaxyproject.org/
ADD COMMENTlink written 7.4 years ago by Jennifer Hillman Jackson25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 178 users visited in the last hour