Question: Cuffmerge/Cuffcompare Duplicate GFF ID errors
0
gravatar for cora
3.6 years ago by
cora0
United States
cora0 wrote:

On the public galaxy server I have run  TOPHAT with mm10 and then Cufflinks v2.1.1 (Galaxy v0.0.7) with annotation file gencode.vM4.annotation.gtf from Gencode because I want the most updated lncRNA annotations. When I run Cuffcompare (Galaxy v0.0.6) or Cuffmerge v1.0.0 (Galaxy v0.0.6), I get the following errors.

Error running cuffmerge. 
[Tue Apr 28 06:53:42 2015] Beginning transcriptome assembly merge
-------------------------------------------

[Tue Apr 28 06:53:42 2015] Preparing output location cm_output/
[Tue Apr 28 06:55:08 2015] Converting GTF files to SAM
[06:55:09] Loading reference annotation.
Error: duplicate GFF ID 'ENSMUST00000105372.3' encountered!
	[FAILED]
Error: could not execute gtf_to_sam

I have tried also running Cuffmerge without Reference annotation or Sequence Data but I still get the errors making me think the problem is in my Cufflinks files?

gff cuffmerge galaxy cufflinks • 1.9k views
ADD COMMENTlink modified 3.6 years ago by Jennifer Hillman Jackson25k • written 3.6 years ago by cora0
0
gravatar for Jennifer Hillman Jackson
3.6 years ago by
United States
Jennifer Hillman Jackson25k wrote:

Hello,

If you ran Cufflinks with a reference annotation file that contained duplicate GFF IDs, then an error such as this would have come up during those runs (these are not accepted by any of the tools in this suite). But you did this and the jobs were successful? Or did you not use the reference annotation at that stage?

Then later, when running Cuffmerge, this error would only be expected to come up if a reference annotation dataset with duplicate GFF IDs was included along with Cufflinks output that did not make use of the same. So, perhaps other errors are resulting when no reference annotation is used?

The root issue is the presence of duplicated GFF IDs in the annotation itself. These will need to be resolved before using the data (correct the file to remove duplicates - just be aware that this will involve scientific decision making and may not be what you want to do). Or, you can use an alternate annotation, such as that from iGenomes.

Best, Jen, Galaxy team

ADD COMMENTlink written 3.6 years ago by Jennifer Hillman Jackson25k

I ran Cufflinks using the Gencode annotation gff as a guide and they ran fine. 

I chose to use the Gencode annotation file because it was the most up-to-date for non-coding genes that I found compared to the igenomes files. I found someone mention in their human data that lines with Selenocysteine were causing the duplicate GFF issue, so I have removed those from my GFF and am rerunning Cufflinks to see if this will fix the problem. Others have mentioned update Cufflinks to version 2.2.1 helps. Is there any plan to update the public galaxy server version?

ADD REPLYlink written 3.6 years ago by cora0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 169 users visited in the last hour