Question: Cufflinks assembles very small number of transcripts from RNA-seq data
3.0 years ago by
jaeil.han10 wrote:

I mapped RNA-seq data to reference genome and tried to assemble transcripts using cufflinks.It is ribosomal RNA depleted samples from yeast RNA.

In the mapped BAM files, I can clearly see many reads in many noncoding RNA genes using genome browser, but they are all missing in the results of cufflinks with either bias correction or without it.

I cannot find what the reason is. Please give me any advice.



3.0 years ago by
Jennifer Hillman Jackson25k wrote:


If the goal is to identify only non-coding transcripts, or even just some specific subset of transcripts, the option for including a Mask File containing transcripts that are not of interest or expected to be expressed in higher abundance (known, common) could be used. Find this option under Advanced Parameters.

Related gotcha: If coding transcript (only) reference annotation is used with Cufflinks "as truth" (instead of as a guide), the result could resemble what is described. Bring up the reference annotation along with the BAM hits in the visualization. If there is no overlap (expected), then use annotation instead "as a guide" or not at all.

Other factors could include how well paired-end reads mapped in "concordance" from Tophat, but those sorts of issues would impact all experiments, not just those seeking out less abundant or novel transcripts.

Best, Jen, Galaxy team


