Question: low number of assembled transcripts for cufflinks with annotated genome
gravatar for ybnrml072
4.4 years ago by
United States
ybnrml0720 wrote:

So I am pretty new to the bioinformatics movement and am having trouble assembling trancripts using cufflinks with a annotated genome online. I am looking for differential gene expression from an RNA-Seq experiment. I created the annotated genome using Rast ( and downloaded the files in both GFF and GTF format. I at first had trouble with duplicate ID's so I deleted those using Excel and uploaded the files. When I run the cufflinks program it only outputs 4-6 transcripts. However if I do not use the annotated genome I get thousands but this makes it much harder to analyze down the line. Has anyone used Rast to annotate genomes in this way before or have any idea of where I may have gone wrong?

Also, the genome I'm working with has been annotated in NCBI but I had trouble getting the correct format to use for cufflinks. Suggestions?


galaxy cufflinks • 1.3k views
ADD COMMENTlink modified 4.4 years ago by Jennifer Hillman Jackson25k • written 4.4 years ago by ybnrml0720
gravatar for Jennifer Hillman Jackson
4.4 years ago by
United States
Jennifer Hillman Jackson25k wrote:


First, I would try using the reference annotation "as a guide", not as an absolute. This is the third and last option for the parameter titled on the form as "Use Reference Annotation:" (known as "-g" on the command-line:

I haven't used Rast, but a double check that your chromosome identifiers between the annotation file and the reference genome are an exact match is very important. If you need to modify the reference genome, then you will likely need to remap with Tophat/Tophat2. 

More about reference genomes is here, including Custom reference genomes, along with troubleshooting help for format. Much can be done within Galaxy to make adjustments.

For reference annotation in general, this wiki has descriptions and link-outs to specifications for common file formats.

I do not know what genome you are working with, but there are many sources for such data in curated format. Under the tool group "Get Data" you'll find the most commonly used ones with Galaxy, but any source's data can be imported/uploaded once in a standardize format. Not all will contain all of the attributes used by this tool suite (in particular Cuffdiff: tss_id and p_id). These can be important, and another reason to use annotation as a guide rather than truth.

Good luck with your research, Jen, Galaxy team

ADD COMMENTlink written 4.4 years ago by Jennifer Hillman Jackson25k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 167 users visited in the last hour