Question: Cufflinks duplicate GFF ID error - best course of action?
0
gravatar for Suzanne Gomes
3.3 years ago by
Suzanne Gomes120
Canada
Suzanne Gomes120 wrote:

Hi, I have a question regarding the best way to handle genome annotations that include duplicate GFF IDs. I originally tried running my data and encountered this problem with the D. pseudoobscura flybase annotation. I realized there was a ton of stuff from different sources in the annotation, so filtered to include only lines with 'FlyBase' and 'gene'. This ran just fine through the rest of the Tuxedo pipeline. However, recently I realized that this results in an annotation with only the whole genes, but no intron/exon structure. Adding the intron/exon lines back into the annotation produces the duplicate GFF error. 

So my question is - is it worth re-running with the exon/intron structure added back in somehow? It sounds like it is possible to work around the GFF ID error, as mentioned here (Cufflinks error when trying to align against genome) though I'm not sure how hard that would be to do. And if I did, should I still include the whole genes, or only the introns/exons (or only the exons?).

Looking at my data in trackster, it seems like cufflinks has done a pretty good job of finding exon/intron boundaries (that match well with those on flybase) all on it's own. But it sometimes has the same gene listed under two different geneIDs in the output (with non-zero FPKMs for both) - one which is the gene with introns, one without (spanning the whole gene). So if it's splitting the reads mapping to two different entries, I worry that that might affect my ability to call differential expression in some cases (especially for genes with few reads mapping already).

Anyone else had this problem? If so, how did you address it?

Thanks

Suzanne

ADD COMMENTlink modified 2.8 years ago • written 3.3 years ago by Suzanne Gomes120
1
gravatar for Jennifer Hillman Jackson
3.3 years ago by
United States
Jennifer Hillman Jackson23k wrote:

Suzanne,

Glad you were able to locate the original reply - those general instructions still are the best path when encountering duplicated GFF IDs, especially for genomes with iGenomes annotation. Unfortunately, D. pseudoobscura does not. Both gene and transcript-level reference annotation provides benefits, but if it is not available in the proper format for the tool, it cannot be used as-is.

If I understood the explanation from the FlyBase team correctly, what appears to be duplicated genes are in fact valid and distinct genomic features. That this causes a conflict with the Tuxedo suite is a known to them, but duplicating GFF IDs is the proper way to model the data scientifically. As far as I know, reads splitting between features labeled under the same GFF ID is expected. 

You could modify the GFF IDs to allow the tools to accept the annotation file in full, but I still think that the FlyBase team would be the best direct source for recommendations. That said, others are still welcome to post comments/experiences! And if you wish to post any feedback you receive from them, and your solution, that would almost certainly aid others.

Best, Jen, Galaxy team

ADD COMMENTlink written 3.3 years ago by Jennifer Hillman Jackson23k
1
gravatar for jogoodma
3.1 years ago by
jogoodma10
jogoodma10 wrote:

Hi Suzanne,

FlyBase started distributing GTF formatted files of our genome data in July for our FB2014_04 release.  This format tends to work much better with cufflinks.  Can you give that a try and let us know?

ftp://ftp.flybase.org/genomes/dpse/current/gtf

 

Cheers,

Josh

FlyBase

ADD COMMENTlink written 3.1 years ago by jogoodma10

Thanks Josh! I will add this information to our RNA-seq wiki help. Jen, Galaxy team

ADD REPLYlink written 3.1 years ago by Jennifer Hillman Jackson23k
0
gravatar for Suzanne Gomes
2.8 years ago by
Suzanne Gomes120
Canada
Suzanne Gomes120 wrote:

Hi Josh,

I tried your link above, but I just get a 'this webpage is not available' error. I also looked on the website, but all I can find is the GFF formatted files. I was also wondering whether you guys have a GTF version of the D. willistoni annotation available?

Thanks

Suzanne

ADD COMMENTlink written 2.8 years ago by Suzanne Gomes120
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 98 users visited in the last hour