3.3 years ago by
An intersect with known annotation mapped to the same reference genome is a very common way to annotate peaks.
The output can be restricted by filtering the MACS peaks and/or by setting up the join/intersections so that only peaks of interest have annotation (example: a particular region instead of the entire genome). Then depending on the format of the annotation joined in, you may still get several rows per-peak anyway. Why? If comparing to a GTF file of transcripts, there can be many transcripts per gene-bound that could overlap with any individual peak interval. And several peaks can map to the same set of transcripts. This is a many-to-many relationship.
Some of the tools in the "Operate on Genomic Intervals" tool group can help to cluster intervals. These might help with the data reduction after the initial merge of peaks vs annotation.
Perhaps try using the Profile Annotations tool to discover potential annotation tracks. Load those into Galaxy. Then, before doing more, visualize the all in Trackster to get an idea about the volume/duplication in the annotation versus the peaks. Focusing on a region/gene that you know about or that is well characterized would be a good place to drill-down to at the detail level.
Hopefully this helps, Jen, Galaxy team