How to get sequence for a single gene from an RNA Seq file?

4.4 years ago by

Germany

Bjoern Gruening ♦ 5.1k wrote:

Hi Wilson,

that depends a little bit on your definition of RNA-Seq1 data file. Can you post an example? If it is a BED file with chr-start-end you can use the Tool 'Extract Genomic DNA using coordinates from assembled/unassembled genomes'.

Ciao,

Bjoern

ADD COMMENT • link modified 4.4 years ago • written 4.4 years ago by Bjoern Gruening ♦ 5.1k

Bjoern,

I am new to RNA-Seq so thanks for the help. I have run Tophat and found that I have too many reads for the gene I am interested in to get it to display in the UCSC browers. I think it is a BAM file. Any suggestions? THanks!

ADD REPLY • link written 4.4 years ago by wilson • 0

4.4 years ago by

Jennifer Hillman Jackson ♦ 25k

United States

Jennifer Hillman Jackson ♦ 25k wrote:

Hello,

Map Genes back to the constituent transcripts using the gene/transcript mapping files produced by Cuffdiff. The follow the transcripts back to Cufflinks results where full gtf lines are present. (The tool "Select" and/or "Filter" can be used to subset lines from these files). Use these coordinate data with the tool "Extract Genomic DNA" to obtain a version of the transcript based on genomic content.

Note that any base-level variation present in the NGS reads, not present in the reference genome, that may have contributed to the creation of those transcripts, will not be included. If you are searching for a known transcript (from a reference annotation file provided), that is also a valid source - and you can often obtain the transcript directly using that source's identifier at 3rd party data sources - many available in the tool group "Get Data" (UCSC, BioMart, etc.).

Even if you used a Custom Reference genome for your analysis, this is possible. Help is here:
http://wiki.galaxyproject.org/Support#Tools_on_the_Main_server:_Extract_DNA

More can be done here... if willing to move to a cloud Galaxy, although some tools may function on a local Galaxy with proper management of resources. Testing following close review of tool documentation (requirements) will guide - including a targeted (regional) transcriptome assembly that includes just the NGS reads that mapped into the gene bound(s) of interest (this will reduce the memory footprint). The Tool Shed offers several options for this path under the group "Assembly", and the "Learn" section of our wiki lists examples of usage in the form of tutorials for the more common choices.

Best, Jen, Galaxy team

ADD COMMENT • link modified 4.4 years ago • written 4.4 years ago by Jennifer Hillman Jackson ♦ 25k

Jen,

Thanks for the help. I have run Cuffdiff on samples and have produced the following files. Which ones have the mapping data? Thank you.

Scott

86 Cuffdiff on data 40, data 36, and data 41: transcript FPKM tracking 85 Cuffdiff on data 40, data 36, and data 41: transcript differential expression testing 84 Cuffdiff on data 40, data 36, and data 41: gene FPKM tracking 83 Cuffdiff on data 40, data 36, and data 41: gene differential expression testing 82 Cuffdiff on data 40, data 36, and data 41: TSS groups FPKM tracking 81 Cuffdiff on data 40, data 36, and data 41: TSS groups differential expression testing 80 Cuffdiff on data 40, data 36, and data 41: CDS FPKM tracking 79 Cuffdiff on data 40, data 36, and data 41: CDS FPKM differential expression testing 78 Cuffdiff on data 40, data 36, and data 41: CDS overloading diffential expression testing 77 Cuffdiff on data 40, data 36, and data 41: promoters differential expression testing 76 Cuffdiff on data 40, data 36, and data 41: splicing differential expression testing 75 Cuffdiff on data 40, data 36, and data 41: TSS groups read group tracking 74 Cuffdiff on data 40, data 36, and data 41: CDs read group tracking 73 Cuffdiff on data 40, data 36, and data 41: genes read group tracking 72 Cuffdiff on data 40, data 36, and data 41: isoforms read group tracking

From: Jennifer Hillman Jackson on Galaxy Biostar <notifications@biostars.org<mailto:notifications@biostars.org>> Reply-To: "galaxystar+7d3f5fdb+code@biostars.io<mailto:galaxystar+7d3f5fdb+code@biostars.io>" <galaxystar+7d3f5fdb+code@biostars.io<mailto:galaxystar+7d3f5fdb+code@biostars.io>> Date: Monday, July 7, 2014 3:50 PM To: Scott Wilson <livvy01@uab.edu<mailto:livvy01@uab.edu>> Subject: [galaxy-biostar] A: How to get sequence for a single gene from an RNA Seq file?

Activity on a post you are following on Galaxy Biostar<http: biostar.usegalaxy.org="">

User Jennifer Hillman Jackson<http: biostar.usegalaxy.org="" u="" 254=""/> wrote Answer: How to get sequence for a single gene from an RNA Seq file?<http: biostar.usegalaxy.org="" p="" 8231="" #8233="">:

Hello,

Note that any base-level variation present in the NGS reads, not present in the reference genome, that may have contributed to the creation of those transcripts, will not be included. If you are searching for a known transcript (from a reference annotation file provided), that is also a valid source - and you can often obtain the transcript directly using that sources identifier at 3rd party data sources - many available in the tool group "Get Data" (UCSC, BioMart, etc.).

Even if you used a Custom Reference genome for your analysis, this is possible. Help is here: http://wiki.galaxyproject.org/Support#Tools_on_the_Main_server:_Extract_DNA

Best, Jen, Galaxy team

ADD REPLY • link written 4.4 years ago by wilson • 0

Similar posts • Search »