Is there a way in galaxy to retrieve the promoter sequences for a list
of genes. I tried using UCSC genome browser, but in many cases it
giving more than one promoter sequence per gene.
The UCSC genome browser will result in one promoter sequence per
transcript, as many genes will have many associated transcripts.
Depending on which track you are obtaining data from, a
transcript for each gene may or may not be notated. If using "UCSC
Genes", the tables associated with the primary table knownGenes to
examine are knownIsoforms and knownCanonical, which will show how
transcripts are clustered. You can link these tables together using
output option "selected fields from primary and related tables".
Alternatively, you may already have already selected a set of
transcripts (not genes) to obtain data from. In that case, enter the
identifiers directly into the UCSC Table browser's "identifiers" field
when performing the query (paste or upload text file). Or pull all
Galaxy and join with your identifier list to subset the results.
Identifiers must be in the same format as used by the track of
for this method to work.
If you are having trouble with the Table browser, the UCSC Genome team
can help through their mailing list at http://genome.ucsc.edu
Hopefully this helps to explain the data,
The tool "Operate on Genomic Intervals -> Get flanks" will return
reference genomic sequence upstream/downstream from a set of
coordinates, which is essentially what the UCSC Genome Browser is
when you use the export promoters option. This tool requires that you
already have a set of genomic intervals (a data track) from an
annotation data source. UCSC is only one choice. Please see all
linked data sources listed under "Get Data". Or, locate your own and
load via URL (http/ftp) or a file that you have already downloaded to
your computer/server (use ftp if large).
Please note that this tool will function with most (but not all)
already part of the native reference genome set in Galaxy and must be
assigned. To assign, use the "Edit Attributes" form by clicking on the
pencil icon for the dataset.