Hi, I have a question regarding the best way to handle genome annotations that include duplicate GFF IDs. I originally tried running my data and encountered this problem with the D. pseudoobscura flybase annotation. I realized there was a ton of stuff from different sources in the annotation, so filtered to include only lines with 'FlyBase' and 'gene'. This ran just fine through the rest of the Tuxedo pipeline. However, recently I realized that this results in an annotation with only the whole genes, but no intron/exon structure. Adding the intron/exon lines back into the annotation produces the duplicate GFF error.
So my question is - is it worth re-running with the exon/intron structure added back in somehow? It sounds like it is possible to work around the GFF ID error, as mentioned here (Cufflinks error when trying to align against genome) though I'm not sure how hard that would be to do. And if I did, should I still include the whole genes, or only the introns/exons (or only the exons?).
Looking at my data in trackster, it seems like cufflinks has done a pretty good job of finding exon/intron boundaries (that match well with those on flybase) all on it's own. But it sometimes has the same gene listed under two different geneIDs in the output (with non-zero FPKMs for both) - one which is the gene with introns, one without (spanning the whole gene). So if it's splitting the reads mapping to two different entries, I worry that that might affect my ability to call differential expression in some cases (especially for genes with few reads mapping already).
Anyone else had this problem? If so, how did you address it?
Thanks
Suzanne