Question: GFF3 file (source: BRAD) incompatibilities
gravatar for dejong.grant
14 months ago by
dejong.grant0 wrote:

Hi there,

I've been trying to analyze Brassica napus transcriptomic data for the purpose of isoform expression and incidence of splicing events which led me to use the Brassica Database GFF3 and fasta files for my index generation (STAR).

After a few errors I managed to get my STAR run working but subsequent software (e.g. rMATS require gtf files and the BRAD GFF3 doesn't seem to be compatible with any GFF3->gtf software.

(I've used gffread and genometools so far).

Has anyone had similar problems with the formatting of these BRAD annotation files?

Example formatting:

chrC03 GazeA2 mRNA 28541218 28543845 572.4227 + .

chrC03 GazeA2 UTR 28543523 28543845 6.0158 + .

chrC03 GazeA2 CDS 28543454 28543522 29.9339 + 0 Parent=BnaC03g43490D;Name=BnaC03g43490D;Alias=GSBRNA2T00158351001

chrC03 GazeA2 CDS 28543158 28543369 27.5481 + 1 Parent=BnaC03g43490D;Name=BnaC03g43490D;Alias=GSBRNA2T00158351001

chrC03 GazeA2 CDS 28542958 28543060 27.3743 + 0 Parent=BnaC03g43490D;Name=BnaC03g43490D;Alias=GSBRNA2T00158351001

Columns 1-8 are mostly consistent with sample GFF3 files but I've noticed a large space in the mRNA row between the score and strand columns. Also, the attribute column is different but I don't know if this is an acceptable departure from the norm.

I managed to get around this problem in STAR through: STAR --runMode genomeGenerate --genomeDir $1 --genomeFastaFiles $genfas --sjdbOverhang 99 --sjdbGTFfile $gff3 --sjdbGTFtagExonParentTranscript Parent --sjdbGTFfeatureExon CDS

Which seems to be correct, and following map job was successful.

Does anyone have any ideas as what could be causing this problem and/or any potential solutions?

Thanks in advance, I've been really wracking my brain.

ADD COMMENTlink modified 14 months ago by Jennifer Hillman Jackson25k • written 14 months ago by dejong.grant0
gravatar for Jennifer Hillman Jackson
14 months ago by
United States
Jennifer Hillman Jackson25k wrote:


The GFF3 data is out of specification. Details are in the same question posted here:

You might want to contact the data source to ask if they offer alternate versions of the data or what their recommendations are for using this data with other tools that are not web-based at their site. (I don't personally know and couldn't find different data with a quick browse).

Sorry we couldn't help more, Jen, Galaxy team

ADD COMMENTlink modified 14 months ago • written 14 months ago by Jennifer Hillman Jackson25k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 176 users visited in the last hour