blast2go ontologies comparison

Question: blast2go ontologies comparison

3.5 years ago by

Canada

ajmaninisha • 30 wrote:

Hi,

i posted this question earlier also but received no feedback, kindly help.

I want to compare the ontologies of differentially expressed genes of my 2 samples and want to use it for BLAST2GO. i followed the following pipeline: tophat-cufflink-cuffmerge-cuffdiff. For running blast2go, a fasta seq is required so i used gffread- extraxt genomic DNA for both the samples (have used the cufflink assembled file for both the samples) . I dont know whether am following the right approach.

My aim is to:

a) find the ontologies of differentially expresses genes in two samples individually

b) is there any way to know which transcript came from which sample because if i use cuffmerge transcript assembled file- gff read- extract genomic DNA- fasta file - BLAST2G0 ( it will give me a fasta seq but will not tell me which transcript came from which sample and is there any other way .

thanks

Nisha

ontology blast2go • 1.1k views

ADD COMMENT • link •

modified 3.5 years ago by Jennifer Hillman Jackson ♦ 25k • written 3.5 years ago by ajmaninisha • 30

3.5 years ago by

Jennifer Hillman Jackson ♦ 25k

United States

Jennifer Hillman Jackson ♦ 25k wrote:

Hi Nisha,

In short, check out the differential expression Cuffdiff outputs. Conditions will have statistics to sort out the transcripts that are significantly associated with one condition versus another. Filter by condition. Then use the tracking outputs to locate the Cuffmerge coordinates for transcript and extract the fasta sequence. There are other methods - such as pulling out all the transcripts first, then filtering by conditions by identifier matches in the tracking files, but I have found this way to be quicker (fewer steps).

More detail: Each transcript will have some mix of reads from the conditions - specifically for the regions of the transcript in common between the conditions. This will be the case for most of the data. The exceptions will be cases where a transcript (or possibly gene) is only expressed in a particular condition (specifically - one or more expressed transcripts that do not share enough exon sequence meeting the paired-end mapping criteria set in earlier steps). Reads from conditions map to transcripts in a many-to-many relationship.

Cufflinks does not create "consensus" sequences - the output is just where in the genome these are located (coordinates). If you extract fasta sequence from the genomic based on those coordinates, splicing differences will be represented, which is the important part (minor changes in read content that did not impact slices should not be a factor for differential expression analysis).

Use tools like "Filter" and/or "Select" to separate transcripts by condition in CuffDiff output. Then tools in the group "Join, Subtract and Group", such as "Join two datasets" will help when tracing back from tracking files between the output files from the different outputs. Once you have the targeted Cufflinks coordinates grouped by condition, do the fasta extraction using the tool "Extract Genomic DNA". Those fasta transcripts can then be run through BLAST+ and mapped to GO terms.

For more about the output files from any of the tools in the Tuxedo suite, the best resource is here: http://cole-trapnell-lab.github.io/cufflinks/manual/

Apologies for delayed reply - nearly all of our team was in travel last week and a few questions slipped through. But, hopefully this helps to get your analysis going! Best, Jen, Galaxy team

ADD COMMENT • link written 3.5 years ago by Jennifer Hillman Jackson ♦ 25k

Similar posts • Search »