Question: Gene Names From Cuffdiff Data
How does one get gene names when using cuffdiff when looking at "gene differential expression testing" results? I am doing some +/- exposure studies with zebrafish embryos and then processing the 50bp single-ended fastq files through Galaxy with particular interest in the cuffdiff readout of differential gene expression. It seems to be working well but my readout is only giving me gene ID's. It would by more efficient if I could get gene names from the output. I suspect I may be able to do this through selection of options from the UCSC TABLE BROWSER when I use its Zv9 danRer7 assembly as a reference genome. I see a place for selected genes but not for all the identified genes. Is this something that could be done directly or is there just a "beyond Galaxy" method of gene gene lists from a list of gene ID's. Thanks, el linney Duke University Medical Center
United States
Hello, If you used RefSeq as your reference annotation GTF from UCSC, the two tables from the UCSC Table browser that you will most likely be initially interested in are refGene and refLink. These can be extracted entirely into Galaxy - as separate data files or together as one file, using the output option "all fields from selected table" or "selected fields from primary and related tables". BED or GTF format will not output the extra fields containing gene symbols, names, descriptions - tip: click on 'describe table schema' to understand a table's contents. Then a tool such as "Join, Subtract and Group -> Join" or "Compare" can be used with the Cuffdiff result's transcript accession and this data to add in the additional labels. If you used a different annotation GTF than RefSeq as your reference, use that track's associated tables from UCSC or Biomart instead. If you need to link genes using overlap - a method similar to the one described in this prior reply could can be used: Hopefully this helps! Jen Galaxy team -- Jennifer Hillman-Jackson Galaxy Support and Training
