Question: Search For Tf Binding Site Patterns In Galaxy
0
Brown, Stuart • 30 wrote:
I am trying to come up with a nice workflow/tutorial for the use of
Galaxy to search for Transcription Factor binding sites on a genome
wide scale using pattern search tools. I want to train my students to
think genomically and to use clever tools to leverage their abilities.
Galaxy is absolutely awesome for grabbing the upstream promoter
regions for all genes from any organism with a whole genome in UCSC.
It is also possible to use the integrated EMBOSS tools such as fuzznuc
and dreg to search for a known TFBS (or any other simple nucleotide
pattern). However, I can't get past the simple search into a more
clever infomation-based search. In particular I have the following
workflow in mind:
1. Collect upstream regions for all mouse (or human) genes
2. Search for a published TF binding site with a single base
mismatch using FUZZNUC
3. Make a multiple alignment of the sequences returned by FUZZNUC
(not possible in any way that I have been able to find)
4. Make a logo from the alignment to identify informative positions
and conserved substitutions (not in Galaxy)
5. Make a PSSM profile, HMM profile, or other smart searching tool
from the aligned sequences (not in Galaxy)
6. Search the upstream regions again with this more sensitive
pattern search method. (not in Galaxy).
7. Make a list of genes targeted with this TFBS,
8. Compare list of genes to microarray data showing co-regulation of
this gene set, or to pathways
I am frustrated at step 3. Even if I bring the FUZZNUC results to my
desktop, there is no easy way to extract just sequences and make a
multiple alignment. Many of the 'allowed' Fuzznuc optional output
formats produce an error, or no useable output.
Thanks for any suggestions.
Stuart M. Brown, Ph.D.
Associate Professor
Center for Health Informatics and Bioinformatics
NYU School of Medicine
550 First Ave, NY, NY 10016
stuart.brown@med.nyu.edu
(212)263-7689 FAX (212) 263-8139
This email message, including any attachments, is for the sole use of
the intended recipient(s) and may contain information that is
proprietary, confidential, and exempt from disclosure under applicable
law. Any unauthorized review, use, disclosure, or distribution is
prohibited. If you have received this email in error please notify the
sender by return email and delete the original message. Please note,
the recipient should check this email and any attachments for the
presence of viruses. The organization accepts no liability for any
damage caused by any virus transmitted by this email.
=================================
ADD COMMENT
• link
•
modified 9.1 years ago
by
Peter Rice • 30
•
written
9.1 years ago by
Brown, Stuart • 30