low number of assembled transcripts for cufflinks with annotated genome

Question: low number of assembled transcripts for cufflinks with annotated genome

4.4 years ago by

United States

ybnrml072 • 0 wrote:

So I am pretty new to the bioinformatics movement and am having trouble assembling trancripts using cufflinks with a annotated genome online. I am looking for differential gene expression from an RNA-Seq experiment. I created the annotated genome using Rast (http://rast.nmpdr.org/rast.cgi) and downloaded the files in both GFF and GTF format. I at first had trouble with duplicate ID's so I deleted those using Excel and uploaded the files. When I run the cufflinks program it only outputs 4-6 transcripts. However if I do not use the annotated genome I get thousands but this makes it much harder to analyze down the line. Has anyone used Rast to annotate genomes in this way before or have any idea of where I may have gone wrong?

Also, the genome I'm working with has been annotated in NCBI but I had trouble getting the correct format to use for cufflinks. Suggestions?

Thanks!!

galaxy cufflinks • 1.3k views

ADD COMMENT • link •

modified 4.4 years ago by Jennifer Hillman Jackson ♦ 25k • written 4.4 years ago by ybnrml072 • 0

4.4 years ago by

Jennifer Hillman Jackson ♦ 25k

United States

Jennifer Hillman Jackson ♦ 25k wrote:

Hello,

First, I would try using the reference annotation "as a guide", not as an absolute. This is the third and last option for the parameter titled on the form as "Use Reference Annotation:" (known as "-g" on the command-line: http://cufflinks.cbcb.umd.edu/manual.html#cufflinks

I haven't used Rast, but a double check that your chromosome identifiers between the annotation file and the reference genome are an exact match is very important. If you need to modify the reference genome, then you will likely need to remap with Tophat/Tophat2.

More about reference genomes is here, including Custom reference genomes, along with troubleshooting help for format. Much can be done within Galaxy to make adjustments.
http://wiki.galaxyproject.org/Support#Reference_genomes
http://wiki.galaxyproject.org/Support#Custom_reference_genome

For reference annotation in general, this wiki has descriptions and link-outs to specifications for common file formats. http://wiki.galaxyproject.org/Learn/Datatypes

I do not know what genome you are working with, but there are many sources for such data in curated format. Under the tool group "Get Data" you'll find the most commonly used ones with Galaxy, but any source's data can be imported/uploaded once in a standardize format. Not all will contain all of the attributes used by this tool suite (in particular Cuffdiff: tss_id and p_id). These can be important, and another reason to use annotation as a guide rather than truth.

Good luck with your research, Jen, Galaxy team

ADD COMMENT • link written 4.4 years ago by Jennifer Hillman Jackson ♦ 25k

Similar posts • Search »