Is there a way to extract the sequence data for a single gene from a RNA Seq1 data file?
Hi Wilson,
that depends a little bit on your definition of RNA-Seq1 data file. Can you post an example? If it is a BED file with chr-start-end you can use the Tool 'Extract Genomic DNA using coordinates from assembled/unassembled genomes'.
Ciao,
Bjoern
Hello,
Map Genes back to the constituent transcripts using the gene/transcript mapping files produced by Cuffdiff. The follow the transcripts back to Cufflinks results where full gtf lines are present. (The tool "Select" and/or "Filter" can be used to subset lines from these files). Use these coordinate data with the tool "Extract Genomic DNA" to obtain a version of the transcript based on genomic content.
Note that any base-level variation present in the NGS reads, not present in the reference genome, that may have contributed to the creation of those transcripts, will not be included. If you are searching for a known transcript (from a reference annotation file provided), that is also a valid source - and you can often obtain the transcript directly using that source's identifier at 3rd party data sources - many available in the tool group "Get Data" (UCSC, BioMart, etc.).
Even if you used a Custom Reference genome for your analysis, this is possible. Help is here:
http://wiki.galaxyproject.org/Support#Tools_on_the_Main_server:_Extract_DNA
More can be done here... if willing to move to a cloud Galaxy, although some tools may function on a local Galaxy with proper management of resources. Testing following close review of tool documentation (requirements) will guide - including a targeted (regional) transcriptome assembly that includes just the NGS reads that mapped into the gene bound(s) of interest (this will reduce the memory footprint). The Tool Shed offers several options for this path under the group "Assembly", and the "Learn" section of our wiki lists examples of usage in the form of tutorials for the more common choices.
Best, Jen, Galaxy team
Jen,
Thanks for the help. I have run Cuffdiff on samples and have produced the following files. Which ones have the mapping data? Thank you.
Scott
86 Cuffdiff on data 40, data 36, and data 41: transcript FPKM tracking 85 Cuffdiff on data 40, data 36, and data 41: transcript differential expression testing 84 Cuffdiff on data 40, data 36, and data 41: gene FPKM tracking 83 Cuffdiff on data 40, data 36, and data 41: gene differential expression testing 82 Cuffdiff on data 40, data 36, and data 41: TSS groups FPKM tracking 81 Cuffdiff on data 40, data 36, and data 41: TSS groups differential expression testing 80 Cuffdiff on data 40, data 36, and data 41: CDS FPKM tracking 79 Cuffdiff on data 40, data 36, and data 41: CDS FPKM differential expression testing 78 Cuffdiff on data 40, data 36, and data 41: CDS overloading diffential expression testing 77 Cuffdiff on data 40, data 36, and data 41: promoters differential expression testing 76 Cuffdiff on data 40, data 36, and data 41: splicing differential expression testing 75 Cuffdiff on data 40, data 36, and data 41: TSS groups read group tracking 74 Cuffdiff on data 40, data 36, and data 41: CDs read group tracking 73 Cuffdiff on data 40, data 36, and data 41: genes read group tracking 72 Cuffdiff on data 40, data 36, and data 41: isoforms read group tracking
From: Jennifer Hillman Jackson on Galaxy Biostar <notifications@biostars.org<mailto:notifications@biostars.org>> Reply-To: "galaxystar+7d3f5fdb+code@biostars.io<mailto:galaxystar+7d3f5fdb+code@biostars.io>" <galaxystar+7d3f5fdb+code@biostars.io<mailto:galaxystar+7d3f5fdb+code@biostars.io>> Date: Monday, July 7, 2014 3:50 PM To: Scott Wilson <livvy01@uab.edu<mailto:livvy01@uab.edu>> Subject: [galaxy-biostar] A: How to get sequence for a single gene from an RNA Seq file?
Activity on a post you are following on Galaxy Biostar<http: biostar.usegalaxy.org="">
User Jennifer Hillman Jackson<http: biostar.usegalaxy.org="" u="" 254=""/> wrote Answer: How to get sequence for a single gene from an RNA Seq file?<http: biostar.usegalaxy.org="" p="" 8231="" #8233="">:
Hello,
Map Genes back to the constituent transcripts using the gene/transcript mapping files produced by Cuffdiff. The follow the transcripts back to Cufflinks results where full gtf lines are present. (The tool "Select" and/or "Filter" can be used to subset lines from these files). Use these coordinate data with the tool "Extract Genomic DNA" to obtain a version of the transcript based on genomic content.
Note that any base-level variation present in the NGS reads, not present in the reference genome, that may have contributed to the creation of those transcripts, will not be included. If you are searching for a known transcript (from a reference annotation file provided), that is also a valid source - and you can often obtain the transcript directly using that sources identifier at 3rd party data sources - many available in the tool group "Get Data" (UCSC, BioMart, etc.).
Even if you used a Custom Reference genome for your analysis, this is possible. Help is here: http://wiki.galaxyproject.org/Support#Tools_on_the_Main_server:_Extract_DNA
More can be done here... if willing to move to a cloud Galaxy, although some tools may function on a local Galaxy with proper management of resources. Testing following close review of tool documentation (requirements) will guide - including a targeted (regional) transcriptome assembly that includes just the NGS reads that mapped into the gene bound(s) of interest (this will reduce the memory footprint). The Tool Shed offers several options for this path under the group "Assembly", and the "Learn" section of our wiki lists examples of usage in the form of tutorials for the more common choices.
Best, Jen, Galaxy team