I am using galaxy rnaseq workflow. I have my annotation file in .gff format, however it needs the annotation format to be in .gtf format, what will be the easiest way to convert the file?
If the datatype is really just GFF, it would not contain key attributes that a GTF does, so it can't be used. But I suspect the data is really GTF or GFF3, as actual GFF is a much older format specification and not used much anymore.
Support FAQ: https://galaxyproject.org/learn/
- Common datatypes explained https://galaxyproject.org/learn/datatypes/
You need to find out the actual file format (GFF versus GTF versus GFF3) first. These are related but distinct datatypes. Some are interchangeable/convertible to one or more of the other two formats and some are not. Tools require the specific datatype(s) stated on tool forms to work correctly.
Why is a check necessary? Because the currently assigned datatype (if in Galaxy, can be user modified/assigned, during or after Upload) and external file extensions are not always reliable indicators of the exact type (both GTF and GFF3 might have a GFF file extension). Data providers and 3rd party wrapped tools also use/produce many variations and hybrids, complicating datatype assignment.
- The data may already be in GTF format but have header lines which can trigger a GFF datatype assignment with the Upload tool. These can be removed if needed, as some tools, but not all tools, will error or ignore annotation lines when a header is present. Use the Select tool, or remove before upload.
- The data may already be in GFF3 format. If actually GFF3 format, use the tool gffread to convert to GFF3 to GTF. Some GFF3 datasets include sequences in the lower section. These can cause problems with tools and are generally the bulk of the data when present. Remove these before loading into Galaxy, or a use combination of tools from the group Text Manipulation.
Please see item 2 in this prior Q&A: https://biostar.usegalaxy.org/p/28094/#28099. The other help/links are also good to review since you are also doing RNA-seq, to avoid job errors or unexpected results.
Thanks! Jen, Galaxy team