Question: CuffMerge Error: GFF Error: duplicate/invalid 'transcript' feature
0
gravatar for Lauren Laboissonniere
2.6 years ago by
Iowa State University
Lauren Laboissonniere20 wrote:

Hi all,

I have searched through all the threads on this topic and I have yet to find a reasonable explanation for why I continue to encounter this error. I have already run Cufflinks on my samples, but when I attempt to run Cuffmerge (cuffmerge -g MouseGTF.fa -s index_mouse.fa -p 8 assemblies.txt), I get the following error:

GFF Error: duplicate/invalid 'transcript' feature ID=id1128281 [FAILED] Error: could not execute cuffcompare

The reason I am so stumped by this is that I have the exact same genome and GFF files in another directory (for a separate project) and I was able to successfully complete cuffmerge on the samples in that other directory yesterday. For some reason this command won't work in this directory. Can anyone lend information about why this might be happening when the files were usable just a day ago?

Thanks so much!

Lauren

rna-seq cuffmerge cufflinks • 2.0k views
ADD COMMENTlink modified 2.6 years ago • written 2.6 years ago by Lauren Laboissonniere20
1
gravatar for Lauren Laboissonniere
2.6 years ago by
Iowa State University
Lauren Laboissonniere20 wrote:

Jen,

Thank you for the quick response. I should let you know, I am not a well-versed bioinformaticist, so this entire pipeline is new to me and I'm really struggling with this hurdle. I did as you suggested and the job completed just fine. I also thought that perhaps the GFF and genome files in the directory that failed might be the problem, so I used those with the successful transcript files and the job again completed successfully. So, what that leads me to believe is that my problem resides somewhere within the transcripts.gtf files generated from cufflinks.

When I look at the first and last ten lines of each of my transcript.gtf files, they appear to look quite normal and I did not receive any errors when running cufflinks. Also, the error I am getting says there is a duplicate/invalid 'trascripts' feature at id1128281. So I looked in each of the files for this id and none of them came back with a hit. When I looked for this idea in my GFF file, I got:

[laboiss@rit2 TrimmedReads]$ grep -w id1128281 MouseGFF.fa NT_114985.3 Gnomon C_gene_segment 138385 146840 . - . ID=id1128281;Parent=gene46614;Dbxref=GeneID:380792,IMGT/GENE-DB:IGHE,MGI:MGI:2685746;gbkey=C_region;gene=Ighe NT_114985.3 Gnomon exon 146514 146840 . - . ID=id1128282;Parent=id1128281;Dbxref=GeneID:380792,IMGT/GENE-DB:IGHE,MGI:MGI:2685746;gbkey=C_region;gene=Ighe NT_114985.3 Gnomon exon 142060 142332 . - . ID=id1128283;Parent=id1128281;Dbxref=GeneID:380792,IMGT/GENE-DB:IGHE,MGI:MGI:2685746;gbkey=C_region;gene=Ighe NT_114985.3 Gnomon exon 141198 141518 . - . ID=id1128284;Parent=id1128281;Dbxref=GeneID:380792,IMGT/GENE-DB:IGHE,MGI:MGI:2685746;gbkey=C_region;gene=Ighe NT_114985.3 Gnomon exon 140795 141115 . - . ID=id1128285;Parent=id1128281;Dbxref=GeneID:380792,IMGT/GENE-DB:IGHE,MGI:MGI:2685746;gbkey=C_region;gene=Ighe NT_114985.3 Gnomon exon 140384 140710 . - . ID=id1128286;Parent=id1128281;Dbxref=GeneID:380792,IMGT/GENE-DB:IGHE,MGI:MGI:2685746;gbkey=C_region;gene=Ighe NT_114985.3 Gnomon exon 138551 138684 . - . ID=id1128287;Parent=id1128281;Dbxref=GeneID:380792,IMGT/GENE-DB:IGHE,MGI:MGI:2685746;gbkey=C_region;gene=Ighe NT_114985.3 Gnomon exon 138385 138468 . - . ID=id1128288;Parent=id1128281;Dbxref=GeneID:380792,IMGT/GENE-DB:IGHE,MGI:MGI:2685746;gbkey=C_region;gene=Ighe

I'm not exactly sure what a 'duplicate' line looks like, also none of these lines have 'transcript' anywhere in them. Do you have any suggestions about what I might do to handle this problem? Thanks again,

Lauren

ADD COMMENTlink modified 2.6 years ago • written 2.6 years ago by Lauren Laboissonniere20

Hello - this is the duplicated GFF ID "ID=id1128281". It is part of the annotation. One run makes use of it, one does not.

The content of the GFF annotation is the problem. This occurs from certain sources. The duplicated lines can be removed or another annotation source found. One would not want to change the content of the expression data itself.

The Select tool in Galaxy can operate like a "grep -v" and remove these lines within Galaxy, or you can remove them line command yourself, then upload that modified file into Galaxy.

ADD REPLYlink written 2.6 years ago by Jennifer Hillman Jackson25k
1

Thank you! I removed that one line and the command ran smoothly from there.

Lauren

ADD REPLYlink written 2.6 years ago by Lauren Laboissonniere20

Great - glad that worked. Is only one gene and will show up in your results just not annotated (most likely - depends on settings). Probably worth it to be able to annotate the remainder. Take care, Jen

ADD REPLYlink modified 2.6 years ago • written 2.6 years ago by Jennifer Hillman Jackson25k
0
gravatar for Jennifer Hillman Jackson
2.6 years ago by
United States
Jennifer Hillman Jackson25k wrote:

Hello,

I suspect that the issue has to do with the GFF content and the content of the data being processed.

The file likely contains a duplicated ID (which is not allowed). The other job probably did not make use of that particular annotation and processed. This new job does.

If you have time and want to test this: re-run the prior Cuffmerge job and see if the error comes up there or not. If it does, please submit the red error dataset as a bug report. We'll want to investigate. Be sure to leave the original Cuffmerge (successful) undeleted, the inputs undeleted, and the new failed run undeleted. Then note these dataset numbers in the comments along with a link to this Biostars post. This is the best way for us to see the before and after with reproducibility to track down type of error this could be.

Please be aware the jobs will queue for a bit longer than usual right now due to heavy load on the server/cluster - 24 hrs, possibly longer. Just allow any new jobs started from now forward to process. I'll watch for the bug report even if you are not able to send it in until next week (if the error comes up! Hopefully not).

Thanks, Jen, Galaxy team

ADD COMMENTlink modified 2.6 years ago • written 2.6 years ago by Jennifer Hillman Jackson25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 172 users visited in the last hour