Question: Preparing for Blast2GO
gravatar for h.stotz
2.9 years ago by
European Union
h.stotz20 wrote:

Here is my problem:

I have used Cuffdiff to determine differential gene expression and extracted genomic DNA sequences.  This has given me almost 7000 genes, which is too much to make sense of the data even after filtering.  I am now interested to do Blast2GO to find processes that are enriched.  I could import into Galaxy annotations for all predicted proteins.  Is there a workflow for implementing Blast2GO or could I at least access all of the predicted protein sequences through the list of my induced genes by considering the protein annotation?



galaxy • 1.1k views
ADD COMMENTlink modified 2.9 years ago by Jennifer Hillman Jackson25k • written 2.9 years ago by h.stotz20
gravatar for Jennifer Hillman Jackson
2.9 years ago by
United States
Jennifer Hillman Jackson25k wrote:


To use BLAST or derivative annotation tools based on its output, review the tools in the Tool Shed. These are for use in a local or cloud Galaxy. The manual and associated publications for tools are usually linked on the tool forms or they can be found online using the tool name. Usage in Galaxy is generally the same as line-command, with any differences noted in the help section on the tool forms.

The protein sequence for any gene names/symbols that you have in existing results can be obtained from many data providers. UCSC, Ensembl, NCBI, and the like. Some have tools where a list of identifiers can be entered and the protein sequence returned as output. Or, you can download the entire target track into Galaxy and use a function like "Join two datasets" to merge on the common identifiers between datasets (your output's identifiers and the target track's identifiers). 

If your output does not have gene names/symbols, these data sources also provide tracks that contain genome coordinates for protein CDS regions. Tools in Galaxy can be used to associate Cuffdiff output with this type of data to build up an association. Search with the keyword interval and/or coordinate to locate tools to perform joins of this nature.

Converting from Fasta-to-Tabular (and back again after filtering) may be necessary, depending on the type of data retrieved from external sources. See the group "Get Data" for those external sources that place output from searches/queries directly into your Galaxy history for analysis. If your source is not listed, then target data can be uploaded via URL or FTP.

There are also public Galaxy servers with a focus on protein annotation. See the list below for options. Each server is owned and administered by the host and they can help with workflows and usage specific to their pipeline's focus.

Hopefully this helps with determining the best pathway to link in higher-level annotation for your data reduction steps.

Jen, Galaxy team

ADD COMMENTlink written 2.9 years ago by Jennifer Hillman Jackson25k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 169 users visited in the last hour