Question: Annotation file formats .gff3 to .gtf conversion
1
gravatar for devbt15
15 months ago by
devbt1530
devbt1530 wrote:

Dear all,

I used the Galaxy tool to convert the .gff3 annotation file to .gtf format. Going through the .gtf file, I discovered few entries where transcript ID is present but no corresponding gene ID or gene name is available.

I am getting an error when running DESeq2 analysis using this .gtf annotation file: 'genes$gene_id' must be a character vector (or factor) with no NAs

I wonder if it is due to missing gene IDs. Please help.

Regards, Das.

rna-seq galaxy • 802 views
ADD COMMENTlink modified 15 months ago • written 15 months ago by devbt1530
1
gravatar for devbt15
15 months ago by
devbt1530
devbt1530 wrote:

Dear Jen, I have shared a new history folder with the concerned .gff3 file in it and shared it so it is accessible to all as I could not send it to galaxy-bugs@lists.galaxyproject.org. Meanwhile in my workflow, for Salmon, I used the .gtf file (position information, IDs, feature type etc.) converted using galaxy and in DESeq2, I used a tabular file containing the mapping of transcript IDs to gene IDs which I extracted from this .gtf file. In addition, I had to clear the extra annotation information for the transcript IDs in the fasta file for reference transcriptome to include only >followed by Lotus japonicus transcript IDs and then cDNA sequence (which worked nicely). Thank you. Das.

ADD COMMENTlink written 15 months ago by devbt1530

Glad this worked out. I just reviewed your history and what you done is the correct way forward.

I also made a copy of the history containing the gff3 dataset. Thanks for doing this. You can unshare it now if you want.

Thanks, Jen, Galaxy team

ADD REPLYlink written 15 months ago by Jennifer Hillman Jackson25k
0
gravatar for Jennifer Hillman Jackson
15 months ago by
United States
Jennifer Hillman Jackson25k wrote:

Hello,

This has come up before and the solution was to filter the input GFF3 dataset first to only retain the lines with a "type" that contains the attributes a complete GTF dataset would have. The tool "Select" and/or "Filter" can be used.

The wrapper for the conversion tool could probably be improved to do this filtering (or at least provide the option to not include certain lines). If you want to share your GFF3 as an example, that would be helpful. I'll compare it against others that had this issue and create an enhancement ticket.

Share by one of these methods:

  • Send in a bug report from any dataset in your history that has this dataset and noting in the bug report comments that you are sharing a GFF dataset that does not covert to GTF well. Note the GFF3 dataset number. Leaving the GTF conversion that produced the poor content in the history would be good, but not necessary (unless you cannot figure out how to modify the GFF3 and need help with that).

  • Generate a share link to your history and send it to galaxy-bugs@lists.galaxyproject.org. It could be a new history that you copy just this dataset into, again, unless you need more help - then send from the history with all the other jobs in it. Include the same information as above in the email text.

Thanks! Jen, Galaxy team

ADD COMMENTlink written 15 months ago by Jennifer Hillman Jackson25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 172 users visited in the last hour