Question: Annotating ChIP seq peaks in Galaxy
gravatar for kathryn.gilroy
2.8 years ago by
kathryn.gilroy0 wrote:

I'm analyzing ChIP seq data using Bowtie and then MACS with the Galaxy web instance, and now want to annotate the peaks with gene names, introns, exons and so on. I have downloaded the relevant genome from UCSC in bed format. I tried a couple of approaches to annotate the MACS output. Firstly I tried to intersect the MACS and genome files, but while this gave an output it didn't add any annotations to the ChIP seq data. I have also tried to join the two datasets, but got a very large file with 10s of millions of reads which wasn't useful. 

Where am I going wrong? Is there a tool to annotate MACS files in Galaxy?



chip seq annotation • 1.4k views
ADD COMMENTlink modified 2.8 years ago by Jennifer Hillman Jackson25k • written 2.8 years ago by kathryn.gilroy0
gravatar for Jennifer Hillman Jackson
2.8 years ago by
United States
Jennifer Hillman Jackson25k wrote:


An intersect with known annotation mapped to the same reference genome is a very common way to annotate peaks.

The output can be restricted by filtering the MACS peaks and/or by setting up the join/intersections so that only peaks of interest have annotation (example: a particular region instead of the entire genome). Then depending on the format of the annotation joined in, you may still get several rows per-peak anyway. Why? If comparing to a GTF file of transcripts, there can be many transcripts per gene-bound that could overlap with any individual peak interval. And several peaks can map to the same set of transcripts. This is a many-to-many relationship. 

Some of the tools in the "Operate on Genomic Intervals" tool group can help to cluster intervals. These might help with the data reduction after the initial merge of peaks vs annotation.

Perhaps try using the Profile Annotations tool to discover potential annotation tracks. Load those into Galaxy. Then, before doing more, visualize the all in Trackster to get an idea about the volume/duplication in the annotation versus the peaks. Focusing on a region/gene that you know about or that is well characterized would be a good place to drill-down to at the detail level.

Hopefully this helps, Jen, Galaxy team

ADD COMMENTlink written 2.8 years ago by Jennifer Hillman Jackson25k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 64 users visited in the last hour