Because I need to use phyloCSF, I up loaded a bed file and want to get a multiple alingment of 29 mammalian genomes. But in the MAF type, just the 100-way multiz(hg19) is available. How can I switch 46-way multiZ(hg18) in stich MAF blocks
Hello,
First, are you working on the Main server at http://usegalaxy.org? Just a double check.
Second, check the "datatype" assignment on your BED file. It should be set to "hg18". Click on the dataset's pencil icon to Edit Attributes and assign/reassign the database if needed. Do this only if your BED file is based on hg18!
Best, Jen, Galaxy team
Dear Jen,
It works now. thank you very much. It troubles me for a whole week. Many thanks
Best wishes for you, Rocky
Subject: [galaxy-biostar] A: How can I switch 46-way multiZ(hg18) in stich MAF blocks of galaxy? From: notifications@biostars.org To: niufubiao@hotmail.com Date: Tue, 13 May 2014 18:03:22 +0000
Activity on a post you are following on Galaxy Biostar
User Jennifer Hillman Jackson wrote Answer: How can I switch 46-way multiZ(hg18) in stich MAF blocks of galaxy?:
Hello,
First, are you working on the Main server at http://usegalaxy.org? Just a double check.
Second, check the "datatype" assignment on your BED file. It should be set to "hg18". Click on the dataset's pencil icon to Edit Attributes and assign/reassign the database if needed. Do this only if your BED file is based on hg18!
Best, Jen, Galaxy team
You may reply via email or visit How can I switch 46-way multiZ(hg18) in stich MAF blocks of galaxy?">How can I switch 46-way multiZ(hg18) in stich MAF blocks of galaxy?
Hello,
You can only directly query this MAF data using coordinates based on the top reference genome: hg18. These can be specific (known human genes) or broad (chromosome regions). But be aware of how large the data output can get - fast - and plan accordingly (chunk the query, etc.). Regions are probably what you want to use with the tool you mention. Or, the entire MAF dataset directly from UCSC if performing the analysis line-command (I do not see it wrapped for Galaxy in the Tool Shed).
When using hg18 coordinates, output from a query can be set up to include data from any of the included species and used for further downstream analysis. An example of that is in the "Using Galaxy 2012" publication. You can find a link to it here in the live supplemental. Protocol 5 is the one you want to look at. There is a screencast that walks through how to isolate data from a individual species in the results:
https://usegalaxy.org/u/galaxyproject/p/using-galaxy-2012
"Filter MAF" and "Convert Formats" will be useful tools from this point. Along with tool in "Operate on Genomic Intervals" to do comparisons analysis in addition to the tool you mention (if you have, for example, a set of pig interval/bed regions of interest).
Second general option: Neither of the pig genomes at UCSC had a multiz comparative track generated (susScr2, susScr3). This is the other option for certain genomes (use the MAF data with your genome of interest as the base reference genome).
Third general option: Disclaimer - it is isn't a great method scientifically .. Convert your alternate coordinates to hg18 using the LiftOver tool. I wouldn't recommend this for scientific reasons for any genome analysis though, as the analysis will be somewhat circular. But for pig (susScr2, susScr3) this is not an option anyway - these do not have chain/net/liftOver data to hg18.
I am not sure if this will meet your goal, but perhaps the first option will work for you. The other two will not work for the genome you mention, but you may have others, and the advice may be of interest to others doing similar work. Anyone can check UCSC to see if the required data is available. Most mammalian data such as this is current on Galaxy Main http://usegalaxy.org, but if not, request that it be added through Trello. Or, add it to your local or cloud instance using the instructions here: ReferenceMAFs
Best, Jen, Galaxy team