Question: Genes From Regions.
0
gravatar for Shawn Anderson
7.4 years ago by
Shawn Anderson10 wrote:
Hello, I'm not sure if this is the place to ask this, but if so - here goes. If I have a list of genomic regions (from CNV gains and losses) comprised of chromosome, start and stop (ie. chr7 68000000 71000000) for a given genome build (HG 18), and I want to add the genes (ideally HUGO gene Symbols or refseqIDs)that reside within each region per line. So I want to input something like this: Sample Chromosome Region Event Length JC 507 CD19 chr10:11,997,707-12,330,274 CN Gain 332568 JC 507 CD19 chr10:47,563,503-48,085,608 CN Loss 522106 JC 507 CD19 chr10:69,510,584-69,951,738 CN Gain 441155 And get an output similar to this: Sample Chromosome Region Event Length Gene Symbols JC 507 CD19 chr10:11,997,707-12,330,274 CN Gain 332568 CDC123, DHTKD1, NUDT5, SEC61A2, UPF2 JC 507 CD19 chr10:47,563,503-48,085,608 CN Loss 522106 AGAP9, ANXA8, ANXA8L1, CTSL1P2, FAM25B, FAM25C, FAM25G, GDF10, GDF2, LOC642826, RBP3, ZNF488 JC 507 CD19 chr10:69,510,584-69,951,738 CN Gain 441155 ATOH7, DNA2, HNRNPH3, MYPN, PBLD, RUFY2, SLC25A16 Possible ? Shawn Anderson Application Scientist - Laboratory for Advanced Genome Analysis Vancouver Prostate Centre - Vancouver General Hospital 2660 Oak Street Vancouver BC V6H 3Z6 P:604-875-4111 ext. 63436 F:604-875-5654 sanderson@prostatecentre.com<mailto:sanderson@prostatecentre.com> www.LAGAPC.ca<http: www.microarray.prostatecentre.com=""/>
• 560 views
ADD COMMENTlink modified 7.4 years ago by Jennifer Hillman Jackson25k • written 7.4 years ago by Shawn Anderson10
0
gravatar for Jennifer Hillman Jackson
7.4 years ago by
United States
Jennifer Hillman Jackson25k wrote:
Hello Shawn, To do this in three steps: 1 - Format your existing file, set type as interval, and assign columns ("Edit attributes"). Start by changing this: chr10:11,997,707-12,330,274 To become like this, separated by tabs: chr10 11997707 12330274 Add in strand if possible: chr10 11997707 12330274 + 2 - Obtain a mapped transcript file that includes gene identifiers a) Once choice is UCSC's "Known Genes" track: From your working history, use tool "Get Data -> UCSC main" Select the genome (hg18) and the track "UCSC Genes", with output = selected fields from primary and related tools and merge in identifiers from tables such as "hg18.kgXref". The track "RefSeq Genes" is another option (RefSeq accession is "name" and gene identifier is "name2". Send query to Galaxy, set type as interval, and assign columns. b) Another choice would normally be Ensembl Genes from "Get Data -> Biomart", but only hg19 is available. 3 - Merge the files based on overlap The tool you will most likely want to use is "Operate on Genomic Intervals -> Join", although you may want to explore others. Help: http://wiki.g2.bx.psu.edu/Learn/Interval%20Operations also see screencasts at http://usegalaxy.org quickies #3 & #5 to start with Hopefully this helps to get you started! Thanks, jen -- Jennifer Jackson http://usegalaxy.org/ http://galaxyproject.org/
ADD COMMENTlink written 7.4 years ago by Jennifer Hillman Jackson25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 149 users visited in the last hour