3.8 years ago by
United States
Hello,
Many data providers may host this annotation, or it can be interpreted from reference gene annotation in several formats (BED12, GTF, GFF) . iGenomes has this annotated in the GTF datasets it provides. You may need to map gene identifiers (HUGO, etc) to the transcript names native to the reference annotation used.
Some genomes at UCSC contain specific tracks for TSS annotation. These can be used as-is, or filtered/intersected with a reference annotation track containing transcript/gene coordinates based on the same reference genome. Or the gene annotation track used directly and the TSS sites extracted. All data from UCSC can be output in BED6/BED12 format, using the tool "Get Data -> UCSC Main". Intersect the data (if needed) using tools from the group "Operate on Genomic Intervals" or possibly "BED Tools". The UCSC tool above will accept a list of gene identifiers as a filter for reference transcript/gene tracks such as Refseq via the Table Browser.
Hopefully this helps, but let us know if you need more details about the method you plan to use. Include details (reference genome used, source) and some sample data from the data files being used. If complex, we may ask you to privately share a history from the public Main Galaxy instance containing the loaded data/manipulations you have tested so far.
Thanks, Jen, Galaxy team