3.4 years ago by
The source for many genomes is explicitly listed under the "Builds" list along with the database name and either common name or genus-species. If the build has a public name in the "dbkey", it is in content identical to the build from the source. This can be found in two places:
- The Upload tool form
- The "Edit Attributes" form reached by clicking on the pencil icon for any dataset
Some may only contain the common build name, but these are when the source is from a public repository with well known identifiers. (Not that everyone will recognize them instantly, but they can be googled and such). For example, "hg38" is a UCSC (http://genome.ucsc.edu) build. These external data providers provide, as part of their deployments, extensive information about how the build was constructed, when, and the source details.
If a genome is ever loaded into an instance through a Data Manager, more information is available about the processing. But this is a relatively new tool set and the data on Main contains both legacy and new genomes. Still, we are looking into ways of using this tool set for new genomes added in and also exposing the source and data processing details in an easy way for users to access (the Galaxy UI directly, or an external but related location, like our wiki).
More about reference genomes is here in our wiki:
If you have any other genomes that you are unable to identify the source of, please feel free to ask. Perhaps start each off as a distinct question, so that they can be tagged and easily found by others.
Thanks! Jen, Galaxy team