This seems like it should be self-explanatory, but I can't figure it out - apologies in advance. I'm playing with ChIP-seq files, using MACs to identify peaks, and of course the peaks that come out in the resulting BED file are of variable length. I'd like to just see if I can identify motifs within the peak "summits" corresponding to the IP'd transcription factor (in this case, CTCF), but all the motif-finding software I can locate require FASTA-formatted sequences of identical size, or else at least relatively limited size. I have been able to pull out the genomic sequences corresponding to my BED file, into a FASTA file, but again each sequence is a different size, some quite large.
I can't seem to find any tool in Galaxy that will allow me to capture, for example, only the central 100 nucleotides of each line of a FASTA file, or similarly reduce the coordinates within a BED file down to that length around the center. I was able to do this "by hand" in Excel, using the XLS output of MACS and the "summit" position called for each peak, but it seems like there must be a tool to perform the same operation within Galaxy.
Assuming this is a soluble problem within Galaxy itself, I might go further and ask if there are good motif discovery tools within Galaxy.
Thanks a lot!