Question: Galaxy GEMINI database version == GEMINI release?
gravatar for akotlar
6 weeks ago by
akotlar0 wrote:

GEMINI includes hardcoded database versions in its build procedure. GEMINI on Galaxy seems to conform to this; it's slightly confusing because the database build date in the appropriate dropdown for GEMINI load is stated as later than the GEMINI release (0.8.1). However, the underlying data looks to belong to that GEMINI release, so I assume this is just a reflection of when this GEMINI build was incorporated into Galaxy.

gemini • 86 views
ADD COMMENTlink modified 6 weeks ago by Jennifer Hillman Jackson23k • written 6 weeks ago by akotlar0
gravatar for Jennifer Hillman Jackson
6 weeks ago by
United States
Jennifer Hillman Jackson23k wrote:


I confirmed and the date associated with Gemini indexes is the date that the data was retrieved.

This retrieval uses the same methods as the line-command methods, except they are wrapped in a Galaxy data manager that does some extra formatting/manipulations to the metadata and internal database links. The name given is custom. Whatever version was the current version of the data at that time is what was indexed and included in Galaxy. I don't see a way to view the timestamps on the original files from the source but I don't think that is/was intended to be revealed by the tool authors.

Hope that helps! Jen, Galaxy team

ADD COMMENTlink written 6 weeks ago by Jennifer Hillman Jackson23k

Thanks Jen,

When you say that the name given is custom, are you saying that 0.8.1 doesn't actually correspond to GEMINI 0.8.1?

Also, since each version of GEMNI seems to hardcode the versions of each database, it seem that Galaxy must have retrieved whatever annotation source version that were listed in the script, since you state that Galaxy didn't modify the build scripts. For instance, from the gemini 0.8.1 source,, line 17: tabix dbsnp.b141.20140813.vcf.gz . This seems to strongly indicate that dbSNP 141 was used, correct?

ADD REPLYlink written 6 weeks ago by akotlar0

Yes, you have mapped the data correctly.

By custom name I mean that whoever generated the index can add in whatever name they want. Ideally, this is descriptive of the source/version. But for this particular tool's indexes, the download date was used instead. We are discussing how to label data better (the "best" label to use can be non-trivial in some cases).

The issue comes down to two factors: 1) dealing with the extremely long length some external datasets (genomes and reference data) require to fully label the source/version and 2) issues around changing labels once already published (creates problems for people already using that data e.g. label confusion). But we will figure something out. Individual genomes already have a full label (source/version) available - the list of genomes in the Upload tool is an example.

Thanks for bringing this up. For this tool in particular, we will incorporate the version for indexes created in the future and are considering changing the current index label to include version.

ADD REPLYlink written 6 weeks ago by Jennifer Hillman Jackson23k

Thank you! Can't emphasize strongly enough how reassuring it is to find that a tool has engaged maintainers. Really appreciate your time, Galaxy team's time.

ADD REPLYlink written 6 weeks ago by akotlar0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 129 users visited in the last hour