Question: Gff Parsing With Biopython
0
gravatar for Mic
5.6 years ago by
Mic70
Australia
Mic70 wrote:
Hi, I have the following GFF file from a SNAP X1 SNAP Einit 2579 2712 -3.221 + . X1-snap.1 X1 SNAP Exon 2813 2945 4.836 + . X1-snap.1 X1 SNAP Eterm 3013 3033 10.467 + . X1-snap.1 X1 SNAP Esngl 3457 3702 -17.856 + . X1-snap.2 X1 SNAP Einit 4901 4974 -4.954 + . X1-snap.3 X1 SNAP Eterm 5021 5150 14.231 + . X1-snap.3 X1 SNAP Einit 6245 7325 -1.525 - . X1-snap.4 X1 SNAP Eterm 5974 6008 5.398 - . X1-snap.4 With the code below I have tried to parse the above GFF file from BCBio import GFF from pprint import pprint from BCBio.GFF import GFFExaminer def retrieve_pred_genes_data(): with open("test/X1_small.snap.gff") as sf: #examiner = GFFExaminer() #pprint(examiner.available_limits(sf)) for rec in GFF.parse(sf): pprintrec.id) pprint(rec.description) pprintrec.name) pprint(rec.features) #pprint(rec.type) #'SeqRecord' object has no attribute #pprint(rec.ref) #'SeqRecord' object has no attribute #pprint(rec.ref_db) #'SeqRecord' object has no attribute #pprint(rec.location) #'SeqRecord' object has no attribute #pprint(rec.location_operator) #'SeqRecord' object has no attribute #pprint(rec.strand) #'SeqRecord' object has no attribute #pprint(rec.sub_features) #'SeqRecord' object has no attribute retrieve_pred_genes_data() and got the following output: 'X1' '<unknown description="">' '<unknown name="">' [SeqFeature(FeatureLocation(ExactPosition(2578), ExactPosition(2712), strand=1), type='Einit'), SeqFeature(FeatureLocation(ExactPosition(2812), ExactPosition(2945), strand=1), type='Exon'), SeqFeature(FeatureLocation(ExactPosition(3012), ExactPosition(3033), strand=1), type='Eterm'), SeqFeature(FeatureLocation(ExactPosition(3456), ExactPosition(3702), strand=1), type='Esngl'), SeqFeature(FeatureLocation(ExactPosition(4900), ExactPosition(4974), strand=1), type='Einit'), SeqFeature(FeatureLocation(ExactPosition(5020), ExactPosition(5150), strand=1), type='Eterm'), SeqFeature(FeatureLocation(ExactPosition(6160), ExactPosition(7325), strand=-1), type='Einit'), SeqFeature(FeatureLocation(ExactPosition(5973), ExactPosition(6008), strand=-1), type='Eterm')] and with GFFExaminer I got these: {'gff_id': {('X1',): 8}, 'gff_source': {('SNAP',): 8}, 'gff_source_type': {('SNAP', 'Einit'): 3, ('SNAP', 'Esngl'): 1, ('SNAP', 'Eterm'): 3, ('SNAP', 'Exon'): 1}, 'gff_type': {('Einit',): 3, ('Esngl',): 1, ('Eterm',): 3, ('Exon',): 1}} I found these examples ( https://github.com/patena/jonikaslab-mutant- pools/blob/master/notes_on_GFF_parsing.txt), but I got these kind of errors: #pprint(rec.type) #'SeqRecord' object has no attribute #pprint(rec.ref) #'SeqRecord' object has no attribute #pprint(rec.ref_db) #'SeqRecord' object has no attribute #pprint(rec.location) #'SeqRecord' object has no attribute #pprint(rec.location_operator) #'SeqRecord' object has no attribute #pprint(rec.strand) #'SeqRecord' object has no attribute #pprint(rec.sub_features) #'SeqRecord' object has no attribute What did I do wrong and how is it possible to access all fields in the above GFF file? Thank you in advance. Mic
gff • 1.5k views
ADD COMMENTlink written 5.6 years ago by Mic70
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 179 users visited in the last hour