Question: Cufflinks error when trying to align against genome
gravatar for bwb012
3.5 years ago by
United States
bwb01220 wrote:

I am working with tophat2 and cufflinks in order to try to align RNA-seq reads against an assembled genome. The reads I am using are for uninfected and infected Drosophila melanogaster larvae. I executed tophat2 on both files of reads, and then downloaded the assembled annotated transcript for chromosome 2L from the modENCODE fly database. When I try to then do cufflinks using one of the read files and chr2L as the reference annotation, I receive the following error message:

Error running cufflinks.
return code = 1
Command line:
cufflinks -q --no-update-check -I 300000 -F 0.100000 -j 0.150000 -p 8 -G /galaxy-repl/main/files/008/639/dataset_8639899.dat -u /galaxy-repl/main/files/008/639/dataset_8639595.dat 
[10:49:46] Loading reference annotation.
Error: duplicate GFF ID '176550' encountered!

Any idea what is causing this error, and how to fix it? I tried the same exact methods described above with a different file of RNA-seq reads and it worked perfectly...

Thank you in advance for any assistance. 

assembly error rna-seq cufflinks • 1.2k views
ADD COMMENTlink modified 3.5 years ago by Jennifer Hillman Jackson23k • written 3.5 years ago by bwb01220
gravatar for Jennifer Hillman Jackson
3.5 years ago by
United States
Jennifer Hillman Jackson23k wrote:


The Flybase reference annotation GFF3 dataset is known to contain a duplicated ID. This is on purpose, and necessary to model a particular piece of the data correctly, but it is not in strict GFF3 format where "IDs" are not allowed to be duplicated within any single GFF3 file. How each tool that uses the data will differ, but this is know to cause a problem with the Tuxedo pipeline, producing the error you report.

The good news is that iGenomes has created a version of the dataset that will work with the tool. You can examine the differences between the files to note how the issue was resolved.

If this should come up again (this is not the only case, nor only source to present with this issue for this tool set), the general solution is to either try to obtain a version that has been adjusted by the data or tool authors (or related parties, as in the case above) or to go in an either reassign a unique ID to the duplicates, or remove a duplicate (sometimes there is just one extra line. Not ideal, but is the only way forward - if you want to use the dataset.

Hope this helps. The Flybase support scientists can explain more about the reasons for the duplication of IDs, if you are curious (contact through their web site). Jen, Galaxy team

ADD COMMENTlink written 3.5 years ago by Jennifer Hillman Jackson23k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 97 users visited in the last hour