Hi
Is there a version of the Ensembl GTF that has both a complete attributes column [with actual gene_id, gene_name ect. rather than just the transcript id repeated] AND the proper nomenclature in the Seqname column [ie, chr1 rather than just 1]?
Hi
Is there a version of the Ensembl GTF that has both a complete attributes column [with actual gene_id, gene_name ect. rather than just the transcript id repeated] AND the proper nomenclature in the Seqname column [ie, chr1 rather than just 1]?
Hello,
Not that I am aware of. But you could create such a file using iGenomes content as a base. The key will be to map the Ensembl identifiers to the UCSC identifiers and then swap those into the GTF file. Just adding a "chr" to the start of identifiers works well for some chromosomes, but not all.
I came across a git repo last week where multiple genome sources had the chromosome identifiers mapped to each other. It looks really really good/useful to me - but of course use with caution and sanity check the results. It is brand-new. I starred it and have been following the progress and updates. These are tabular files - so can be loaded and used within Galaxy easily. If you have questions about the content - the repository owner would be the best contact.
http://github.com/dpryan79/ChromosomeMappings
Tools in the group "Text manipulation" along with the tool "Join two Datasets side by side on a specified field" will do the transformation (although, you could also do this line-command, then upload to Galaxy).
Hopefully this helps! Jen, Galaxy team