Hi all,
Could you help me to fined Alu elements in genes as a part of their coding exons
Regards
Hi all,
Could you help me to fined Alu elements in genes as a part of their coding exons
Regards
Hello,
ALU sequence would not be found in the coding region of genes directly (as far as I know), but LINE elements can be and may be what you are looking for instead. Repeat masker is a common tool used to identify these.
First, location a reference annotation dataset that contains the genomic coordinates for these feature. A BED dataset would be ideal. UCSC http://genome.ucsc.edu/ is a common source (review the annotation tracks available for your reference genome, if present there, then export them to Galaxy using the Table Browser through the "Get Data -> UCSC Main" tool). Others data sources are certainly options. See the tools under "Get Data" on the public Main Galaxy instance at http://usegalaxy.org for examples data providers.
Once you load the dataset, it may contain many types of repeat elements. Use tool to filter as needed using the tool "Filter and Sort -> Select (or) Filter" (depending on the file contents) while in tabular format.
Next, use a coordinate comparison tool to find the overlapping regions. There are a few choices. However, I would recommend starting with the tool "Operate on Genomic Intervals -> Intersect". Others in this tool group may be of interest. More options are certain tools in the tool group "BEDTools" (some also accept a BAM dataset, if you are using that as one of your inputs).
There are several tutorials available for using the "Operate on Genomic Intervals" tools, including the help directly on each tool's form. This publication has details for usage of each tool in Protocol 4 (Protocols 1 & 2 may also be helpful):
http://usegalaxy.org/u/galaxyproject/p/using-galaxy-2012
Hopefully this guides you in the correct direction, Jen, Galaxy team