Question: Gff Not Recognized In Cufflink
0
Qian Dong • 50 wrote:
Dear Team,
I've been having a problem with cufflink regarding GFF files. I tried
searching the mailing list first and failed to find an answer. Could
you
help me look at this?
I downloaded my genome annotation GFF file from NCBI (soon I realized
NCBI
format may be a problem) for my bacterial RNA-seq data analysis. My
GFF
file looks like the following:
'##gff-version 3#!gff-spec-version 1.20#!processor NCBI
annotwriter##sequence-region
NC_011420.2 1 4355543##species
http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=414684NC_01
1420.2
RefSeqregion14355543.+.ID=id0;Dbxref=taxon:414684;Is_circular=true
;culture-collection=ATCC:51521;gb-synonym=Rhodocista
centenaria SW;gbkey=Src;genome=chromosome;mol_type=genomic
DNA;strain=SW%3B
ATCC 51521NC_011420.2RefSeqgene113343.+.
ID=gene0;Name=RC1_0011;Dbxref=GeneID:7008893;gbkey=Gene;locus_tag=RC1_
0011
NC_011420.2RefSeqCDS113343.+0ID=cds0;Name=YP_002296275.1;Parent=gene0;
Note=Contains
a type I secretion target ggxgxdxxx repeat %282 copies%29 domain%3B
Contains a Cadherin domain%3B identified by match to protein family
HMM
PF02789;Dbxref=Genbank:YP_002296275.1,GeneID:7008893;gbkey=CDS;product
=hypothetical
protein;protein_id=YP_002296275.1;transl_table=11
I used this file for cufflink but all the FPKM values are 0. I
checked out
this link: http://cufflinks.cbcb.umd.edu/gff.html and thought that
maybe
the problem is because I don't have any mRNA feature in my gff file.
Since
I am dealing with a bacterial genome, there is no exon/intron or UTR
info
needed. Therefore I modified my GFF file into the following:
##gff-version 3#!gff-spec-version 1.20#!processor NCBI
annotwriter##sequence-region
NC_011420.2 1 4355543##species
http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=414684NC_01
1420.2
RefSeqregion14355543.+.ID=id0;Dbxref=taxon:414684;Is_circular=true
;culture-collection=ATCC:51521;gb-synonym=Rhodocista
centenaria SW;gbkey=Src;genome=chromosome;mol_type=genomic
DNA;strain=SW%3B
ATCC 51521NC_011420.2RefSeqmRNA113343.+.
ID=mRNA0;Name=RC1_0011;Dbxref=GeneID:7008893;gbkey=Gene;locus_tag=RC1_
0011
NC_011420.2RefSeqCDS113343.+0ID=cds0;Name=YP_002296275.1;Parent=mRNA0;
Note=Contains
a type I secretion target ggxgxdxxx repeat %282 copies%29 domain%3B
Contains a Cadherin domain%3B identified by match to protein family
HMM
PF02789;Dbxref=Genbank:YP_002296275.1,GeneID:7008893;gbkey=CDS;product
=hypothetical
protein;protein_id=YP_002296275.1;transl_table=11
I re-ran cufflink however this time there is error reported. I can
only
tell from the report that there is a segmentation fault but not
further
details. The report is as follows:
Error running cufflinks.
return code = 139
Command line:
cufflinks -q --no-update-check -I 100 -F 0.100000 -j 0.150000 -p 4 -G
/galaxy/test_pool/pool5/files/000/327/dataset_327777.dat
/galaxy/test_database/files/000/325/dataset_325086.dat
[19:41:41] Loading reference annotation.
Segmentation fault
cp: cannot stat
`/galaxy/test_pool/pool3/tmp/job_working_directory/000/170/170197/glob
al_model.txt':
No such file or directory
cp: cannot stat
`/galaxy/test_pool/pool3/tmp/job_working_directory/000/170/170197/isof
orms.fpkm_tracking':
No such file or directory
cp: cannot stat
`/galaxy/test_pool/pool3/tmp/job_working_directory/000/170/170197/gene
s.fpkm_tracking':
No such file or directory
My questions will be:
1. Is there any way to modify a NCBI bacterial genome annotation GFF
file
to make it usable for cufflink? Our genome annotation is only
available in
NCBI, not ensemble or USDC so this is pretty much my only choice..
2. Should I proceed with modifying the GFF file or should I convert it
into
GTF and use the GTF instead in cufflink?
I am a biochemist and really new to the computer world so any advice
will
help!
Thanks a lot,
Qian
--
Qian Dong
Bauer Lab, MCBD
Simon Hall: 313-317
212 S. Hawthorne Dr.
Bloomington, IN 47405
Email:dong3@indiana.edu
Lab Phone:812-855-8443
ADD COMMENT
• link
•
modified 6.2 years ago
by
Jennifer Hillman Jackson ♦ 25k
•
written
6.2 years ago by
Qian Dong • 50