Question: converting .gff file to .gtf
1
gravatar for kudzu
6 months ago by
kudzu10
United States
kudzu10 wrote:

I am using galaxy rnaseq workflow. I have my annotation file in .gff format, however it needs the annotation format to be in .gtf format, what will be the easiest way to convert the file?

gff gff3 annotation gtf rna-seq • 890 views
ADD COMMENTlink modified 6 months ago by Jennifer Hillman Jackson25k • written 6 months ago by kudzu10
0
gravatar for Jennifer Hillman Jackson
6 months ago by
United States
Jennifer Hillman Jackson25k wrote:

Hello,

If the datatype is really just GFF, it would not contain key attributes that a GTF does, so it can't be used. But I suspect the data is really GTF or GFF3, as actual GFF is a much older format specification and not used much anymore.

Support FAQ: https://galaxyproject.org/learn/

You need to find out the actual file format (GFF versus GTF versus GFF3) first. These are related but distinct datatypes. Some are interchangeable/convertible to one or more of the other two formats and some are not. Tools require the specific datatype(s) stated on tool forms to work correctly.

Why is a check necessary? Because the currently assigned datatype (if in Galaxy, can be user modified/assigned, during or after Upload) and external file extensions are not always reliable indicators of the exact type (both GTF and GFF3 might have a GFF file extension). Data providers and 3rd party wrapped tools also use/produce many variations and hybrids, complicating datatype assignment.

  • The data may already be in GTF format but have header lines which can trigger a GFF datatype assignment with the Upload tool. These can be removed if needed, as some tools, but not all tools, will error or ignore annotation lines when a header is present. Use the Select tool, or remove before upload.
  • The data may already be in GFF3 format. If actually GFF3 format, use the tool gffread to convert to GFF3 to GTF. Some GFF3 datasets include sequences in the lower section. These can cause problems with tools and are generally the bulk of the data when present. Remove these before loading into Galaxy, or a use combination of tools from the group Text Manipulation.

Please see item 2 in this prior Q&A: https://biostar.usegalaxy.org/p/28094/#28099. The other help/links are also good to review since you are also doing RNA-seq, to avoid job errors or unexpected results.

Thanks! Jen, Galaxy team

ADD COMMENTlink modified 6 months ago • written 6 months ago by Jennifer Hillman Jackson25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 172 users visited in the last hour