Question: Cufflinks generated duplicated features
0
gravatar for claudiorivero92
23 months ago by
claudiorivero920 wrote:

Hi everyone i post my situation because i have this error:

GFF Error: duplicate/invalid 'transcript' feature ID=PHASIBEAM10F001038T1 [FAILED]

I asked here before and answered me delete the duplicated values, so i think that these duplicated values are in my gff3 file in the annotation of genome. So i reviewed my genome annotation and i found that the CDS features are duplicated. Just for see if it works, i erase the CDS features, and it worked for a time, but when i make cuffdiff assay the program returns me empty values. So the CDS are necessary, and make cuffdiff works, i thought. Well if the duplicated value is not in my annotation value maybe is in the cufflinks file, so i downloaded one file (assembled_transcripts gtf) and look for these duplicated values in the report error. when i find these values, appear something like that:

duplicated features cufflinks assembled_transcripts

If look in the image exist 2 transcripts with the same id (two transcript feature in the top)

transcript_id "PHASIBEAM10F001285T1"

But one have the gene_id and the other not. Looking in the gff3 gene annotation just exist one transcript with the name transcript_id "PHASIBEAM10F001285T1" and have 3 exons like in the transcript with gene_id lack. These happen with some transcripts and not happen with the rest, i don't know why these transcripts have this error and the others not. i don't know if it's a cufflinks error, if is a wrong setting in TopHat/cufflinks protocol, or is the annotation genome format. Please if anyone know what it's happen please answer me. My best regards Claudio

ADD COMMENTlink modified 8 months ago • written 23 months ago by claudiorivero920
0
gravatar for Jennifer Hillman Jackson
23 months ago by
United States
Jennifer Hillman Jackson24k wrote:

Hello,

Editing these GFF files with duplicated IDs can be tricky. And any annotation those removed lines provided is lost.

Did you remove lines so that just one of the duplicates ( complete, with gene, transcript(s), exon(s) ) and use that for all Cufflinks runs? Followed by using those Cufflinks GTF outputs along with that modified GFF reference annotation with CuffMerge? The resulting Cuffmerge GTF can than be used as the GTF reference annotation for Cuffdiff. http://cole-trapnell-lab.github.io/cufflinks/manual/

If this still does not work, I suggest locating a reference annotation source that does not contain duplicated GFF IDs. (or find a GTF source).

Thanks, Jen, Galaxy team

ADD COMMENTlink written 23 months ago by Jennifer Hillman Jackson24k
0
gravatar for claudiorivero92
8 months ago by
claudiorivero920 wrote:

I solved the problem just running the gff3 file in to the gffread tool, and transform it into gtf2. Cheers

ADD COMMENTlink modified 8 months ago • written 8 months ago by claudiorivero920
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 156 users visited in the last hour