2.6 years ago by
Iowa State University
Jen,
Thank you for the quick response. I should let you know, I am not a well-versed bioinformaticist, so this entire pipeline is new to me and I'm really struggling with this hurdle. I did as you suggested and the job completed just fine. I also thought that perhaps the GFF and genome files in the directory that failed might be the problem, so I used those with the successful transcript files and the job again completed successfully. So, what that leads me to believe is that my problem resides somewhere within the transcripts.gtf files generated from cufflinks.
When I look at the first and last ten lines of each of my transcript.gtf files, they appear to look quite normal and I did not receive any errors when running cufflinks. Also, the error I am getting says there is a duplicate/invalid 'trascripts' feature at id1128281. So I looked in each of the files for this id and none of them came back with a hit. When I looked for this idea in my GFF file, I got:
[laboiss@rit2 TrimmedReads]$ grep -w id1128281 MouseGFF.fa
NT_114985.3 Gnomon C_gene_segment 138385 146840 . - . ID=id1128281;Parent=gene46614;Dbxref=GeneID:380792,IMGT/GENE-DB:IGHE,MGI:MGI:2685746;gbkey=C_region;gene=Ighe
NT_114985.3 Gnomon exon 146514 146840 . - . ID=id1128282;Parent=id1128281;Dbxref=GeneID:380792,IMGT/GENE-DB:IGHE,MGI:MGI:2685746;gbkey=C_region;gene=Ighe
NT_114985.3 Gnomon exon 142060 142332 . - . ID=id1128283;Parent=id1128281;Dbxref=GeneID:380792,IMGT/GENE-DB:IGHE,MGI:MGI:2685746;gbkey=C_region;gene=Ighe
NT_114985.3 Gnomon exon 141198 141518 . - . ID=id1128284;Parent=id1128281;Dbxref=GeneID:380792,IMGT/GENE-DB:IGHE,MGI:MGI:2685746;gbkey=C_region;gene=Ighe
NT_114985.3 Gnomon exon 140795 141115 . - . ID=id1128285;Parent=id1128281;Dbxref=GeneID:380792,IMGT/GENE-DB:IGHE,MGI:MGI:2685746;gbkey=C_region;gene=Ighe
NT_114985.3 Gnomon exon 140384 140710 . - . ID=id1128286;Parent=id1128281;Dbxref=GeneID:380792,IMGT/GENE-DB:IGHE,MGI:MGI:2685746;gbkey=C_region;gene=Ighe
NT_114985.3 Gnomon exon 138551 138684 . - . ID=id1128287;Parent=id1128281;Dbxref=GeneID:380792,IMGT/GENE-DB:IGHE,MGI:MGI:2685746;gbkey=C_region;gene=Ighe
NT_114985.3 Gnomon exon 138385 138468 . - . ID=id1128288;Parent=id1128281;Dbxref=GeneID:380792,IMGT/GENE-DB:IGHE,MGI:MGI:2685746;gbkey=C_region;gene=Ighe
I'm not exactly sure what a 'duplicate' line looks like, also none of these lines have 'transcript' anywhere in them. Do you have any suggestions about what I might do to handle this problem? Thanks again,
Lauren