Question: Modifying GFF files
gravatar for cjain
12 months ago by
cjain0 wrote:


I am working with a GFF file, which is missing 5' UTR annotations. I thought I might be able to add UTR data using the tools on Galaxy as follows:

1) Convert the existing GFF file to Excel format (.xls) using the pencil icon on Galaxy. 2) Download the Excel file and make changes. 3) Upload the modified Excel file and convert it back to GFF using the pencil icon tool. 4) Run HtSeq on RNA-Seq data and the new GFF file to get counts corresponding to each UTR.

However, when I did this the I got no data output. Can anyone suggest what I might have done wrong or how I can modify GFF files using simple tools that do not require bioinformatics expertise?


Chaitanya Jain

rna-seq • 440 views
ADD COMMENTlink modified 12 months ago by Jennifer Hillman Jackson25k • written 12 months ago by cjain0
gravatar for Jennifer Hillman Jackson
12 months ago by
United States
Jennifer Hillman Jackson25k wrote:


There was probably some format problem introduced when adding in the additional lines, changing the existing content of lines, or in the cycle through Excel (which is notoriously problematic for manipulating and producing plain text output). Even one hidden character (example: a soft return) will cause problems in most bioinformatics pipeline/tools - Galaxy or not.

See the tool in the group Text Manipulation, and the related tool groups immediately below it, in the tool panel at for data rearrangement options using Galaxy. Many are based on common line-command tools presented through a GUI interface that most should be able to use even with little prior experience. There will always be corner-case manipulations not possible with these tools, but the tools are certainly worth a try, and are from the core original unique features offered in Galaxy. Tools used in combination to achieve complex manipulations, just as one would do in a shell with pipes, are very empowering for non-technical bioinformatic researchers.


  • Make sure that the final dataset meets the specifications of the datatype and is assigned that datatype.
  • Don't rule out other potential factors, such as a reference genome mismatch problem, until confirmed to be a non-factor. Test the original GFF to confirm that the baseline dataset is working.
  • Consider finding another data source that includes the missing annotation.
  • Google to find online datatype format validators. Many are web based and quick to use.
  • Consider reaching out to a technical colleague, fellow student, and/or co-worker who understands unix tools/data manipulation. What you are attempting is non-trivial and the solution will be specific to your situation.


Thanks, Jen, Galaxy team

ADD COMMENTlink written 12 months ago by Jennifer Hillman Jackson25k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 115 users visited in the last hour