Question: Isolating core sequences ("summits") of BED or FASTA files
gravatar for murtaugh
3.3 years ago by
United States
murtaugh0 wrote:

This seems like it should be self-explanatory, but I can't figure it out - apologies in advance. I'm playing with ChIP-seq files, using MACs to identify peaks, and of course the peaks that come out in the resulting BED file are of variable length. I'd like to just see if I can identify motifs within the peak "summits" corresponding to the IP'd transcription factor (in this case, CTCF), but all the motif-finding software I can locate require FASTA-formatted sequences of identical size, or else at least relatively limited size. I have been able to pull out the genomic sequences corresponding to my BED file, into a FASTA file, but again each sequence is a different size, some quite large.

I can't seem to find any tool in Galaxy that will allow me to capture, for example, only the central 100 nucleotides of each line of a FASTA file, or similarly reduce the coordinates within a BED file down to that length around the center. I was able to do this "by hand" in Excel, using the XLS output of MACS and the "summit" position called for each peak, but it seems like there must be a tool to perform the same operation within Galaxy.

Assuming this is a soluble problem within Galaxy itself, I might go further and ask if there are good motif discovery tools within Galaxy.

Thanks a lot!

bed macs • 1.0k views
ADD COMMENTlink written 3.3 years ago by murtaugh0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 175 users visited in the last hour