Question: cufflink only assembles 1 transcript for gene of interest when tophat splice junctions suggest different alternative splicings
gravatar for roxy.zhang
13 months ago by
roxy.zhang0 wrote:

Hello Friends

I am currently doing RNA-seq hoping to find new transcript (we found with our qPCR, other lab also found though experimentally induced) and to quantify them. My tophat splice junction results (viewed on IGB) displayed splice junctions that suggest different alternative splicing. However, when I looked at the assembled transcript, there seems to be only 1 transcript assembled. Therefore, I was wondering why this is? And maybe some suggestions on how I should proceed.. Thanks everyone!

Best, Roxy

rna-seq tophat cufflink • 353 views
ADD COMMENTlink modified 13 months ago by Jennifer Hillman Jackson25k • written 13 months ago by roxy.zhang0
gravatar for Jennifer Hillman Jackson
13 months ago by
United States
Jennifer Hillman Jackson25k wrote:


RNA-seq tutorials can be found here for example protocols using the tool as wrapped for Galaxy:

More is at the tool author's website. There is also a google group for the tool suite. See Manual + Help here:

Low coverage of splice junctions could cause these to be not considered when transcripts are created by Cufflinks. This is the simplest reason why splices detected by Tophat were not promoted to multiple alternative transcripts assembly by Cufflinks.

Are you using a reference annotation input at this step (optional with all tools)? As a guide (identifies and includes novel transcripts) or as truth (only uses/reports known transcripts)? What was the source? Illumina sources are linked here:

If not from Illumina, does the reference annotation contain the attributes for p_id and tss_id (and optionally gene_name if you want gene labels included in the output)? Are the chromosome identifiers for the reference annotation an exact match for the chromosome identifiers on the reference genome used?

Was a custom reference genome used? Did you run NormalizeFasta on it first?

Support hub:

Let us know if the above does not help. Sharing some details of what you are doing will aid understanding the data and what (if any) the usage problem is.

Thanks! Jen

ADD COMMENTlink written 13 months ago by Jennifer Hillman Jackson25k

Hello Jen

Thank you for your prompt reply! I am using the hg19 as a guide reference genome, and I selected it from galaxy's own source. My data is from Illumina Hiseq. I was thinking of adding more subjects in, maybe that will help? Because I got my data from SRA and the original paper had over 50 individuals sequenced and I am only using a small amount of that. Do you think it will be better? Also, should I try aligning again with tophat using a more recent reference genome i.e. the Ch38? Is that going to change anything? I also tried using the assembled transcript as a mask dataset to be ignored on cufflink but that resulted in nothing.

Best, Roxy

ADD REPLYlink written 13 months ago by roxy.zhang0

Hi Roxy,

From this information, the data is probably too sparse to generate multiple isoforms.

Adding a few more might help. However, that is not the best way to create a test/sample dataset for RNA-seq (and most) analysis, as this would be a horizontal slice (reads covering the entire genome mapping with shallow coverage depth) of the data instead of a vertical slice (reads covering a shorter region of the genome -perhaps a single chromosome- having greater coverage depth derived from single or multiple samples/conditions).

The RNA-seq tutorials in the link above include sample data. Perhaps give these a try if you want to learn how to use the tools.

I see your other question and it likely has the same solution. I'll reply distinctly there or close it out and refer back to this question.

ADD REPLYlink modified 13 months ago • written 13 months ago by Jennifer Hillman Jackson25k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 100 users visited in the last hour