Dear All,
I ran Tuxedo at Galaxy (public, online), using TopHat2-Cufflinks-Cuffmerge-Cuffdiff. I expected my Cuffdiff output to contain gene_name, so that I could directly identify genes in downstream analyses. However, it seems to be missing and I only have a list of transcript ids (all isoforms) for each gene instead.
I used Reference genome at all steps (Cufflinks, Cuffdiff) downloaded from UCSC with Ensembl annotations. Now when I e.g. open a file with gene fpkm tracking, my columns tracking_id and gene_id are the same and contain XLOC ids. The column with gene_short_name contains a list of Ensembl transcript ids (although it's a gene file, it just puts all transcript ids belonging to that gene there).
So to me it looks like the columns are not filled appropriately. I wondered if somebody knows what I might have done wrong or has encountered a similar problem.
Below a fragment of a file - the gene_short_name column contains ENST ids in other files which i checked too. This is a gene fpkm tracking file...
tracking_id | class_code | nearest_ref_id | gene_id | gene_short_name |
XLOC_000001 | - | - | XLOC_000001 | ENST00000450305,ENST00000456328,ENST00000515242,ENST00000518655 |
XLOC_000002 | - | - | XLOC_000002 | ENST00000469289,ENST00000473358,ENST00000607096 |
XLOC_000003 | - | - | XLOC_000003 | ENST00000594647,ENST00000606857 |
XLOC_000004 | - | - | XLOC_000004 | ENST00000492842 |
XLOC_000005 | - | - | XLOC_000005 | ENST00000335137 |
XLOC_000006 | - | - | XLOC_000006 | ENST00000442987 |
XLOC_000007 | - | - | XLOC_000007 | ENST00000496488 |
XLOC_000008 | - | - | XLOC_000008 | ENST00000419160,ENST00000423728,ENST00000425496,ENST00000431321,ENST00000431812,ENST00000432964,ENST00000440038,ENST00000440163,ENST00000445840,ENST00000453935,ENST00000455207,ENST00000455464,ENST00000514436,ENST00000599771,ENST00000601486,ENST00000601814,ENST00000608420 |
Any ideas on what might've gone wrong are very much welcome!
Monika