Question: How to get sequence for a single gene from an RNA Seq file?
0
gravatar for wilson
4.4 years ago by
wilson0
United States
wilson0 wrote:

Is there a way to extract the sequence data for a single gene from a RNA Seq1 data file?

rna-seq • 2.9k views
ADD COMMENTlink modified 4.4 years ago by Jennifer Hillman Jackson25k • written 4.4 years ago by wilson0
0
gravatar for Bjoern Gruening
4.4 years ago by
Bjoern Gruening5.1k
Germany
Bjoern Gruening5.1k wrote:

Hi Wilson,

that depends a little bit on your definition of RNA-Seq1 data file. Can you post an example? If it is a BED file with chr-start-end you can use the Tool 'Extract Genomic DNA using coordinates from assembled/unassembled genomes'.

Ciao,

Bjoern

ADD COMMENTlink modified 4.4 years ago • written 4.4 years ago by Bjoern Gruening5.1k

Bjoern,

I am new to RNA-Seq so thanks for the help.  I have run Tophat and found that I have too many reads for the gene I am interested in to get it to display in the UCSC browers. I think it is a BAM file.  Any suggestions?  THanks!

ADD REPLYlink written 4.4 years ago by wilson0
0
gravatar for Jennifer Hillman Jackson
4.4 years ago by
United States
Jennifer Hillman Jackson25k wrote:

Hello,

Map Genes back to the constituent transcripts using the gene/transcript mapping files produced by Cuffdiff. The follow the transcripts back to Cufflinks results where full gtf lines are present. (The tool "Select" and/or "Filter" can be used to subset lines from these files). Use these coordinate data with the tool "Extract Genomic DNA" to obtain a version of the transcript based on genomic content.

Note that any base-level variation present in the NGS reads, not present in the reference genome, that may have contributed to the creation of those transcripts, will not be included. If you are searching for a known transcript (from a reference annotation file provided), that is also a valid source - and you can often obtain the transcript directly using that source's identifier at 3rd party data sources - many available in the tool group "Get Data" (UCSC, BioMart, etc.).

Even if you used a Custom Reference genome for your analysis, this is possible. Help is here:
http://wiki.galaxyproject.org/Support#Tools_on_the_Main_server:_Extract_DNA

More can be done here... if willing to move to a cloud Galaxy, although some tools may function on a local Galaxy with proper management of resources. Testing following close review of tool documentation (requirements) will guide - including a targeted (regional) transcriptome assembly that includes just the NGS reads that mapped into the gene bound(s) of interest (this will reduce the memory footprint). The Tool Shed offers several options for this path under the group "Assembly", and the "Learn" section of our wiki lists examples of usage in the form of tutorials for the more common choices.

Best, Jen, Galaxy team

ADD COMMENTlink modified 4.4 years ago • written 4.4 years ago by Jennifer Hillman Jackson25k

Jen,

Thanks for the help. I have run Cuffdiff on samples and have produced the following files. Which ones have the mapping data? Thank you.

Scott

86 Cuffdiff on data 40, data 36, and data 41: transcript FPKM tracking 85 Cuffdiff on data 40, data 36, and data 41: transcript differential expression testing 84 Cuffdiff on data 40, data 36, and data 41: gene FPKM tracking 83 Cuffdiff on data 40, data 36, and data 41: gene differential expression testing 82 Cuffdiff on data 40, data 36, and data 41: TSS groups FPKM tracking 81 Cuffdiff on data 40, data 36, and data 41: TSS groups differential expression testing 80 Cuffdiff on data 40, data 36, and data 41: CDS FPKM tracking 79 Cuffdiff on data 40, data 36, and data 41: CDS FPKM differential expression testing 78 Cuffdiff on data 40, data 36, and data 41: CDS overloading diffential expression testing 77 Cuffdiff on data 40, data 36, and data 41: promoters differential expression testing 76 Cuffdiff on data 40, data 36, and data 41: splicing differential expression testing 75 Cuffdiff on data 40, data 36, and data 41: TSS groups read group tracking 74 Cuffdiff on data 40, data 36, and data 41: CDs read group tracking 73 Cuffdiff on data 40, data 36, and data 41: genes read group tracking 72 Cuffdiff on data 40, data 36, and data 41: isoforms read group tracking

From: Jennifer Hillman Jackson on Galaxy Biostar <notifications@biostars.org<mailto:notifications@biostars.org>> Reply-To: "galaxystar+7d3f5fdb+code@biostars.io<mailto:galaxystar+7d3f5fdb+code@biostars.io>" <galaxystar+7d3f5fdb+code@biostars.io<mailto:galaxystar+7d3f5fdb+code@biostars.io>> Date: Monday, July 7, 2014 3:50 PM To: Scott Wilson <livvy01@uab.edu<mailto:livvy01@uab.edu>> Subject: [galaxy-biostar] A: How to get sequence for a single gene from an RNA Seq file?

Activity on a post you are following on Galaxy Biostar<http: biostar.usegalaxy.org="">

User Jennifer Hillman Jackson<http: biostar.usegalaxy.org="" u="" 254=""/> wrote Answer: How to get sequence for a single gene from an RNA Seq file?<http: biostar.usegalaxy.org="" p="" 8231="" #8233="">:

Hello,

Map Genes back to the constituent transcripts using the gene/transcript mapping files produced by Cuffdiff. The follow the transcripts back to Cufflinks results where full gtf lines are present. (The tool "Select" and/or "Filter" can be used to subset lines from these files). Use these coordinate data with the tool "Extract Genomic DNA" to obtain a version of the transcript based on genomic content.

Note that any base-level variation present in the NGS reads, not present in the reference genome, that may have contributed to the creation of those transcripts, will not be included. If you are searching for a known transcript (from a reference annotation file provided), that is also a valid source - and you can often obtain the transcript directly using that sources identifier at 3rd party data sources - many available in the tool group "Get Data" (UCSC, BioMart, etc.).

Even if you used a Custom Reference genome for your analysis, this is possible. Help is here: http://wiki.galaxyproject.org/Support#Tools_on_the_Main_server:_Extract_DNA

More can be done here... if willing to move to a cloud Galaxy, although some tools may function on a local Galaxy with proper management of resources. Testing following close review of tool documentation (requirements) will guide - including a targeted (regional) transcriptome assembly that includes just the NGS reads that mapped into the gene bound(s) of interest (this will reduce the memory footprint). The Tool Shed offers several options for this path under the group "Assembly", and the "Learn" section of our wiki lists examples of usage in the form of tutorials for the more common choices.

Best, Jen, Galaxy team

ADD REPLYlink written 4.4 years ago by wilson0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 168 users visited in the last hour