Question: Retrieving Genomic Sequence Using Ucsc Table Browser
7.3 years ago
Jim Johnson
Jim Johnson wrote:
I tried to retrieve a set of 20 bp length genomic sequences using the UCSC Table Browser, using assembly track and providing a set of defined regions. The Table Browser returned large sequence regions that included the requested regions instead of just the requested bases. Is there a setting for UCSC Table Browser that will return just the requested bases? Thanks, JJ
written 7.3 years ago by Jim Johnson
7.3 years ago
Jennifer Hillman Jackson wrote:
Hello JJ, To extract genome sequence for specified coordinates you have (at least) two options: 1) Use the Galaxy tool Fetch Sequences -> Extract Genomic DNA. Create a file of the coordinates (Interval or BED - see the tool for formatting details). Be sure to set the target database. This will extract the regions - and just those regions - directly into your history. 2) Create a custom track (BED file) of the coordinates, load it into the UCSC Genome Database as a custom track, bring it up in the UCSC Table Browser (group = Custom track), make certain that region = genome, and select output = fasta. Leave the rest of the settings at default, including the box checked "Galaxy", to load back into your Galaxy main history. The problem you encountered with the UCSC Table Browser is not really a problem, but is simply the way "regions" are interpreted for the query function. When extracting coordinates from a track in the Table browser, the "regions" are used to extract overlapping regions from the target track selected. It doesn't matter what the target track is - Assembly (genome) or RefSeq genes (mRNA) - the same result type of result is obtained -> the entire overlapping record from that track's primary table. Useful, but not what you want for this particular query. Try #1 and please let us know if you need more help! Thanks, Jen Galaxy team -- Jennifer Jackson
written 7.3 years ago by Jennifer Hillman Jackson
