Question: cuffmerge encounters errors in gtf/gff files
0
gravatar for BC357
3.2 years ago by
BC35770
United States
BC35770 wrote:

Does anyone know how to fix the following cuffmerge errors?

Initially I thought it's the "reference annotation" file that was not recognized (as I have gff in the history : GCF_000001635.24_GRCm38.p4_genomic.gff). However, when I wanted to switch the reference file to GTF ( Mus_musculus.GRCm38.81.gtf ), the dropdown window did not even recognize the gtf file.

People on other forums have tried to remove dup/invalid transcripts, but I do not know if that was a real fix or temporary fix. Any idea? Or where can I find more info about the potential fix? (cuffmerge website: http://cole-trapnell-lab.github.io/cufflinks/cuffmerge/index.html/ does not seem to provide the answers I am looking for).

 

[Fri Sep 25 18:01:42 2015] Preparing output location cm_output/
[Fri Sep 25 18:02:08 2015] Converting GTF files to SAM
[18:02:08] Loading reference annotation.
GFF Error: duplicate/invalid 'transcript' feature ID=id1128281
	[FAILED]
Error: could not execute gtf_to_sam

 

cuffmerge cufflinks • 1.5k views
ADD COMMENTlink modified 3.2 years ago by Jennifer Hillman Jackson25k • written 3.2 years ago by BC35770
0
gravatar for Jennifer Hillman Jackson
3.2 years ago by
United States
Jennifer Hillman Jackson25k wrote:

Hello,

The question is a bit difficult to understand, but I'll start with the few parts that have simple answers then you can explain more if anything is left.

  1. If a file of a specific type is not recognized by tools, then the "datatype" needs to be assigned. This is how Support#Tool_doesn.27t_recognize_dataset
  2. The tool error suggests that you have at least one reference annotation file attached (just maybe not the one you want to use?). The duplicate ID problem is a difficult one to solve and it must be resolved in order to use the GFF3 dataset with this tool set. Perhaps try the other instead, if this still meets your research goals. Or try the one from iGenomes.
  3. Please be aware that it is very important to use only one reference genome for each analysis. This means that all data involved in the analysis must be based off of this single reference genome - even any reference annotation used. If you are switching reference annotation datasets you may need to remap against a different target reference genome so that all is an exact match.

Best, Jen, Galaxy team

ADD COMMENTlink written 3.2 years ago by Jennifer Hillman Jackson25k

Thank you very much for the suggestion. I will try to upload another GFF3 ref genome/annotation to run my cufflinks .gtf output against and see if the error would be gone. If not, I will try to remap my initial BAM to a new ref genome.

Apology for my poorly worded question, as I am not well versed in RNA-seq terminology. Part of my initial question was about the "differences" between GFF v GFF3 v GTF. Although Cuffmerge is said to accept both .gff3 and .gtf files as annotation references, it fails to recognize the mouse .gtf file ( Mus_musculus.GRCm38.81.gtf) in my history. While it is NOT related to the error message per se, I was curious why the mouse gtf does not get recognized as an "able" reference annotation for cuffmerge.

 

ADD REPLYlink written 3.2 years ago by BC35770
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 172 users visited in the last hour