Question: Peak Annotation With Galaxy
0
gravatar for Wooi Lim
4.7 years ago by
Wooi Lim30
Wooi Lim30 wrote:
Dear Galaxy, I am analysing ChIP-Seq data from Illumina using Galaxy web server. I mapped the reads with bowtie and did the peak calling with Macs. The next thing I wanted to do is to annotate the peaks with genomic regions i.e. promoter, intergenic, intron etc and gene names. I am not sure if these can be achieved through Galaxy and if so, how can this be done? Thank you. Catheryn
alignment bowtie • 4.4k views
ADD COMMENTlink modified 4.7 years ago by Jennifer Hillman Jackson25k • written 4.7 years ago by Wooi Lim30
1
gravatar for Jennifer Hillman Jackson
4.7 years ago by
United States
Jennifer Hillman Jackson25k wrote:
Hello Catheryn, Yes, all of this can be done. Once you have an annotation source identified (or sources!), the rest is part of the core functionality of Galaxy. One of the outputs from MACS is a bed file with the peaks. BED format is similar to interval format and can be used with the tools in the group "Operate on Genomic Intervals". Or if as BED, with tools in the group "BEDTools" (such as 'Intersect multiple sorted BED files'). If you need help understanding these datatypes, this wiki explains - see the last bullet for links: http://wiki.galaxyproject.org/Support#Dataset_special_cases The idea is to obtain annotation data also in BED/interval format, then perform the comparisons. Where there is overlap (or no overlap, in the case of intergenic), the annotation can be assigned. I am not sure what genome you are working with, but if it is available from UCSC or another common public site, this can be fairly straightforward (but this is very important - the same, exact base reference genome that you mapped against must be the one you extract annotation from - the name in Galaxy will be the same exact name as the source in nearly all cases - please ask if you have a question about this). At UCSC, the Table browser contains all the annotation tracks found in the Browser itself, and you will most likely want to use those from the "Gene and Gene Prediction" group, although there are likely others in the ENCODE group that are also of interest. The description for each track is at UCSC, including methods, often very detailed. When extracting the data (using the tool "Get Data -> UCSC Main table browser"), options to subset the BED output regions by exons or introns or predicted promoter regions, etc. are available. Biomart can be another great source of annotation, especially for genomes in Ensembl annotation builds. The tool would be "Get Data -> BioMart Central server". The same basic extraction concepts would apply although the form is organized differently. The help there will guide you. The important parts are the chromosome, start, and end. The best tip I can offer when working with Biomart data is to avoid HTML content - this is often found in the longer descriptions. If you get an import error about HTML content, this isn't a huge problem. Just try again, eliminating suspected fields - the field/s with the HTML can usually be identified quickly with a few test imports. There are other sources in this "Get Data" tool group and many other external annotation projects that have data (from these you can simply download/upload or directly load via a URL). You can start with a larger file with all of the details, compare with just coordinates, then go back and pick up the details with a final join. Some examples of how to do these types of operations are in our ChIP-seq example and in our paper from last year, links here: https://usegalaxy.org/u/james/p/exercise-chip-seq https://usegalaxy.org/u/galaxyproject/p/using-galaxy-2012 Please note that the public Main server at usegalaxy.org will be unavailable during US East coast business hours tomorrow as stated on the current banner: "TACC will be performing storage system updates on Tuesday, December 3 from 9 AM to 6 PM EST (UTC -0500). During this time, Galaxy will be unavailable." Hopefully this helps! Jen Galaxy team -- Jennifer Hillman-Jackson http://galaxyproject.org
ADD COMMENTlink written 4.7 years ago by Jennifer Hillman Jackson25k
Hi Jennifer, Thank you for this great post! I know it’s likely to be a tremendous redundancy as a request, but is there somehow to move your response up in the search list? Always grateful for your extreme patience with us all, David FHCRC
ADD REPLYlink written 4.7 years ago by zod50
Thanks Zod, Very nice of you to like and reply! No patience required :) Nothing I know of will just move it up in the custom google searches in one step, but we have some ideas in play about the lists that may help with promoting certain Q/A threads soon. (No firm details quite yet to share) What can be done now is to put more of this type of content into the wiki & tutorials. And then organize/label it well. The goggle searches will pick up content from those sources as well, usually with better focus. Thanks for the suggestion! Jen Galaxy team
ADD REPLYlink written 4.7 years ago by Jennifer Hillman Jackson25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 139 users visited in the last hour