3.6 years ago by
United States
Hello,
There are a few ways to correlate gene symbols with the output, here are some examples:
1. Use a reference annotation dataset with the attribute "gene_name" included. The versions from iGenomes contain these, as examples (Shared Data -> Data Libaries -> iGenomes on Main has a few loaded/uncompressed). These are same-species annotations.
2. After running the analysis, join the file with a tabular annotation file that contains at least one unique attribute also present in the Cuffdiff output (transcript_id is good) and then the Gene symbol. Use the tool "Join two Datasets side by side on a specified field". To obtain the annotation file, you can build it up from UCSC output (for example, if using RefSeq transcripts, then the value "name2" in the complete "RefSeq Genes" track's primary table is a Gene Symbol. This will not be present in a BED export - instead export the whole table or at least the transcript_id and name2 field and send to Galaxy.
#2 can be same-species or cross-species. However you can make connections that fit your analysis goals, just put the data into the same file and use it in the join.
3. Another option for UCSC-based genomes: use coordinate overlap to annotate from other genome sources. The idea is to have the transcript (with an attached gene symbol) from the other target genome in a file where the transcript is mapped to that other reference genome. Then use "Lift-Over -> Convert genome coordinates" to convert the Cuffdiff coordinates to that other genome. And finish by looking for overlap between the two using the "Operate on Genomic intervals -< Join" tool.
In short, any way that you can link together the data from one file to another - by either a common field or by overlapping coordinates based on the same reference genome - can be used to pull in annotation. And it isn't limited to Gene Symbol. There are annotation tools that will link info from external files (in specific dataset formats, or from specific sources), and those can be very useful if they fit your data, but you can always join in your own using a method like those above.
Just keep in mind that a many-to-many relationship can exist, even within datasets based on the same reference genome. For an example of that at UCSC, examine hg19's "kgXref" table.
Hopefully this helps, Jen, Galaxy team