Question: Search For Tf Binding Site Patterns In Galaxy
gravatar for Brown, Stuart
8.3 years ago by
Brown, Stuart30 wrote:
I am trying to come up with a nice workflow/tutorial for the use of Galaxy to search for Transcription Factor binding sites on a genome wide scale using pattern search tools. I want to train my students to think genomically and to use clever tools to leverage their abilities. Galaxy is absolutely awesome for grabbing the upstream promoter regions for all genes from any organism with a whole genome in UCSC. It is also possible to use the integrated EMBOSS tools such as fuzznuc and dreg to search for a known TFBS (or any other simple nucleotide pattern). However, I can't get past the simple search into a more clever infomation-based search. In particular I have the following workflow in mind: 1. Collect upstream regions for all mouse (or human) genes 2. Search for a published TF binding site with a single base mismatch using FUZZNUC 3. Make a multiple alignment of the sequences returned by FUZZNUC (not possible in any way that I have been able to find) 4. Make a logo from the alignment to identify informative positions and conserved substitutions (not in Galaxy) 5. Make a PSSM profile, HMM profile, or other smart searching tool from the aligned sequences (not in Galaxy) 6. Search the upstream regions again with this more sensitive pattern search method. (not in Galaxy). 7. Make a list of genes targeted with this TFBS, 8. Compare list of genes to microarray data showing co-regulation of this gene set, or to pathways I am frustrated at step 3. Even if I bring the FUZZNUC results to my desktop, there is no easy way to extract just sequences and make a multiple alignment. Many of the 'allowed' Fuzznuc optional output formats produce an error, or no useable output. Thanks for any suggestions. Stuart M. Brown, Ph.D. Associate Professor Center for Health Informatics and Bioinformatics NYU School of Medicine 550 First Ave, NY, NY 10016 (212)263-7689 FAX (212) 263-8139 This email message, including any attachments, is for the sole use of the intended recipient(s) and may contain information that is proprietary, confidential, and exempt from disclosure under applicable law. Any unauthorized review, use, disclosure, or distribution is prohibited. If you have received this email in error please notify the sender by return email and delete the original message. Please note, the recipient should check this email and any attachments for the presence of viruses. The organization accepts no liability for any damage caused by any virus transmitted by this email. =================================
galaxy • 1.2k views
ADD COMMENTlink modified 8.3 years ago by Peter Rice30 • written 8.3 years ago by Brown, Stuart30
gravatar for Anton Nekrutenko
8.3 years ago by
Penn State
Anton Nekrutenko1.7k wrote:
Dear Stuart: Unfortunately there are no currently tools for actually producing multiple alignments, yet it is being requested more and more. I have created a ticket for this issue ( central/issue/218/multiple-alignemnts ), so you can follow its status. Thanks, anton Anton Nekrutenko
ADD COMMENTlink written 8.3 years ago by Anton Nekrutenko1.7k
I just browsed into this message, but I wonder if galaxy users who make this request are aware of the MEME/MAST and other related motif-searching tools. They may not be exactly what's being requested of Galaxy but they could be useful. -- ========================== Formal contact information: Kenneth M Weiss, PhD Evan Pugh Professor of Anthropology and Genetics Professor of Biology Department of Anthropology Penn State University 409 Carpenter Bldg University Park, PA 16802-3404 Phone: 814.865.0989 (office) 814.237.9405 (home) Fax: 814.863.1474 Email: kenweiss(at) (old ID kmw4(at), still works) Web page: My co-author Anne Buchanan and I have a BLOG: The Mermaid's Tale, named after our recent book of that title, and at:
ADD REPLYlink written 8.3 years ago by Ken Weiss10
gravatar for Peter Rice
8.3 years ago by
Peter Rice30
Peter Rice30 wrote:
Do you want just the regions that matched the pattern? EMBOSS has an option -rformat listfile that will make a list file of the subsequences. You can use this as input to any othr EMBOSS progam using the syntax @filename (though I'm not sure how easy that is within Galaxy). If you want the whole sequences we can add a new report format to EMBOSS that simply reports sequences with a feature. The easiest is to add the sequence to the EMBL, SwissProt and other feature outputs. Fasta is tricky as we have to write features to a separate GFF file (Galaxy is clever but writing 1 or 2 output files is not the kindest way to deliver results) If you need new programs, we are happy to add them to the next EMBOSS release and put them into Galaxy. Hope this helps, Peter Rice
ADD COMMENTlink written 8.3 years ago by Peter Rice30
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 79 users visited in the last hour