Question: Metagenome Analysis
0
gravatar for Mike Dyall-Smith
5.6 years ago by
Mike Dyall-Smith20 wrote:
I have looked through the metagenome tools and looked at the tutorials, and was wondering how one could pull out reads that contain specific protein domains or COGS. Blastx is not possible (?) but megablast could get GI codes, and these could potentially be used to retrieve CDD information. I just can't see the way to do this on galaxy. Any suggestions would be greatly appreciated. Mike DS Sent from my iPhone4
galaxy • 905 views
ADD COMMENTlink modified 5.6 years ago by Jennifer Hillman Jackson25k • written 5.6 years ago by Mike Dyall-Smith20
0
gravatar for Jennifer Hillman Jackson
5.6 years ago by
United States
Jennifer Hillman Jackson25k wrote:
Hi Mike, To use BLASTX directly, a wrapper is available in the Tool Shed for use with a local or cloud instance of Galaxy. Please see: http://toolshed.g2.bx.psu.edu http://getgalaxy.org http://usegalaxy.org Another option is to map against the target genome, then compare coordinates of those hits with the coordinates of known annotation that represents CCDS or alternate protein tracks of interest. UCSC, Biomart, and other sources under "Get Data" can be used to import BED/Interval data directly into Galaxy. Compare coordinates using tools in the group "Operate on Genomic Intervals". There are other tools that compare coordinates (Bed Tools, etc.) but these are a good place to start. A several of our tutorials have examples of how to compare coordinates, including "Galaxy 101" and protocols 1 & 4 of "Using Galaxy". The tool's themselves also have help directly on the tool forms. https://main.g2.bx.psu.edu/u/aun1/p/galaxy101 https://main.g2.bx.psu.edu/u/galaxyproject/p/using-galaxy-2012 If you used an Ensembl annotation track, then tools in the group "Genome Diversity -> KEGG and GO" might be of interest to you. The UCSC "Known Genes" track also has some extra tables (http://genome.ucsc.edu) that you may find interesting to pull in and consider, if you decided to use that as the annotation track to compare against. Most (if not all) of this data can linked together either through coordinates or identifiers, but it is not available for all genomes, you will have to check at the data sources. For predictive domain analysis using conserved genomic data, the tools in "Fetch Alignments" function with MAF inputs. A bed file of hits can be used to query out data from multiple species, obtain sequence, etc. for downstream analysis. Protocol 5 in the "Using Galaxy" paper above has a walk-through of how this can be done. If the public Main server does not have the MAF data for your genome, and it is small, it is possible to use one from the history. If it is larger, using a local or cloud Galaxy would be recommended. Be sure to check the Tool Shed if there is a specific tool that you are looking for. If it is not there now, you could ask if someone has it or if it is the process of being wrapped (on the development list: galaxy-dev@bx.psu.edu). And keep checking back, more tools are added all the time. Best, Jen Galaxy team -- Jennifer Hillman-Jackson Galaxy Support and Training http://galaxyproject.org
ADD COMMENTlink written 5.6 years ago by Jennifer Hillman Jackson25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 178 users visited in the last hour