Question: Cufflinks Assigning separate Cuff I.D. to gene that falls within location of known gene
gravatar for sekalazare
4.2 years ago by
sekalazare0 wrote:


I am following the workflow for differential expression analysis using cuffdiff on galaxy as follows: Map with Tophat2 > Cufflinks of Bam using reference annotation as a guide > CuffMerge of cufflinks files > Cuffdiff using Bam + cuffmerge gtf.

My reference annotation is UCSC genes.gtf from igenomes.

However, looking at my Cufflinks file I have a large number of "Cuff IDs" in my gene expression files with no gene short name or TSS_id. Instead of names or ID's in these positions I simply get '-'. While I am interested in novel RNA expression and Differential expression of known & novel genes, I can see that some of these [what I presume cufflinks thinks are] unknown positions actually fall into the position of known genes. For example (this is copied from my Cufflinks file):

tracking_id class_code nearest_ref_id gene_id gene_short_name tss_id locus length coverage FPKM FPKM_conf_lo FPKM_conf_hi FPKM_status
CUFF.1079 - - CUFF.1079 Rb1cc1 TSS8370 chr1:6204742-6266185 - - 1.18724 0.983072 37.4475 OK
CUFF.1085 - - CUFF.1085 - - chr1:6207003-6207485 - - 1.18332 0.442443 1.42565 OK
CUFF.1086 - - CUFF.1086 - - chr1:6210480-6210669 - - 13.1213 1.12835 3.63579 OK
CUFF.1087 - - CUFF.1087 - - chr1:6215499-6215718 - - 10.3601 1.29837 3.89512 OK
CUFF.1088 - - CUFF.1088 - - chr1:6215604-6215679 - - 206.913 94.0516 338.586 OK


CUFF.1085 - 1088 falls within the range of gene Rb1cc1 (CUFF.1079). There are many other examples of this in my Cufflinks file. Why is cufflinks calling them as different? Should they not all count to the expression of Rb1cc1?


Is there a way for cufflinks to recognise that these fall within the gene, or am I missing something?


Help is appreciated!

ADD COMMENTlink modified 4.2 years ago by Jennifer Hillman Jackson25k • written 4.2 years ago by sekalazare0
gravatar for Jennifer Hillman Jackson
4.2 years ago by
United States
Jennifer Hillman Jackson25k wrote:


What you are seeing is "discovery" of alternately spliced transcripts that are grouped with a known gene bound. You can either use these or not. If you are only interested in known transcripts, skip CuffMerge and use the iGenomes reference GTF file directly.

Much more about pipeline alternatives that include discovery versus those that only perform differential expression based on known data (transcripts / genes) is at the Tuxedo manual itself. It is linked from our wiki here along with other resources:

Thanks, Jen, Galaxy team

ADD COMMENTlink written 4.2 years ago by Jennifer Hillman Jackson25k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 176 users visited in the last hour