Hi,
I am following the workflow for differential expression analysis using cuffdiff on galaxy as follows: Map with Tophat2 > Cufflinks of Bam using reference annotation as a guide > CuffMerge of cufflinks files > Cuffdiff using Bam + cuffmerge gtf.
My reference annotation is UCSC genes.gtf from igenomes.
However, looking at my Cufflinks file I have a large number of "Cuff IDs" in my gene expression files with no gene short name or TSS_id. Instead of names or ID's in these positions I simply get '-'. While I am interested in novel RNA expression and Differential expression of known & novel genes, I can see that some of these [what I presume cufflinks thinks are] unknown positions actually fall into the position of known genes. For example (this is copied from my Cufflinks file):
tracking_id | class_code | nearest_ref_id | gene_id | gene_short_name | tss_id | locus | length | coverage | FPKM | FPKM_conf_lo | FPKM_conf_hi | FPKM_status |
CUFF.1079 | - | - | CUFF.1079 | Rb1cc1 | TSS8370 | chr1:6204742-6266185 | - | - | 1.18724 | 0.983072 | 37.4475 | OK |
CUFF.1085 | - | - | CUFF.1085 | - | - | chr1:6207003-6207485 | - | - | 1.18332 | 0.442443 | 1.42565 | OK |
CUFF.1086 | - | - | CUFF.1086 | - | - | chr1:6210480-6210669 | - | - | 13.1213 | 1.12835 | 3.63579 | OK |
CUFF.1087 | - | - | CUFF.1087 | - | - | chr1:6215499-6215718 | - | - | 10.3601 | 1.29837 | 3.89512 | OK |
CUFF.1088 | - | - | CUFF.1088 | - | - | chr1:6215604-6215679 | - | - | 206.913 | 94.0516 | 338.586 | OK |
CUFF.1085 - 1088 falls within the range of gene Rb1cc1 (CUFF.1079). There are many other examples of this in my Cufflinks file. Why is cufflinks calling them as different? Should they not all count to the expression of Rb1cc1?
Is there a way for cufflinks to recognise that these fall within the gene, or am I missing something?
Help is appreciated!