Genome build accession for Galaxy build-in genomes

Question: Genome build accession for Galaxy build-in genomes

3.7 years ago by

jirin • 30

United States

jirin • 30 wrote:

How can I find exact genome build (NCBI accession number, like GCA_000001405.17) for build-in genomes on Galaxy main web site? When I choose the aligner (for example Bowtie2) only the version of the genome, like Human (Homo sapiens) (b38): hg38, is listed.

How important is the exact match between genome build (not genome version) used as reference for alignment and genome build of GTF/GFF3 file used downstream of the alignment?

Thank you,

Jiri

alignment • 1.2k views

ADD COMMENT • link •

modified 3.7 years ago by Jennifer Hillman Jackson ♦ 25k • written 3.7 years ago by jirin • 30

3.7 years ago by

Jennifer Hillman Jackson ♦ 25k

United States

Jennifer Hillman Jackson ♦ 25k wrote:

Hello,

The source for many genomes is explicitly listed under the "Builds" list along with the database name and either common name or genus-species. If the build has a public name in the "dbkey", it is in content identical to the build from the source. This can be found in two places:

The Upload tool form
The "Edit Attributes" form reached by clicking on the pencil icon for any dataset

Some may only contain the common build name, but these are when the source is from a public repository with well known identifiers. (Not that everyone will recognize them instantly, but they can be googled and such). For example, "hg38" is a UCSC (http://genome.ucsc.edu) build. These external data providers provide, as part of their deployments, extensive information about how the build was constructed, when, and the source details.

If a genome is ever loaded into an instance through a Data Manager, more information is available about the processing. But this is a relatively new tool set and the data on Main contains both legacy and new genomes. Still, we are looking into ways of using this tool set for new genomes added in and also exposing the source and data processing details in an easy way for users to access (the Galaxy UI directly, or an external but related location, like our wiki).

More about reference genomes is here in our wiki:
http://wiki.galaxyproject.org/Support#Reference_genomes

If you have any other genomes that you are unable to identify the source of, please feel free to ask. Perhaps start each off as a distinct question, so that they can be tagged and easily found by others.

Thanks! Jen, Galaxy team

ADD COMMENT • link modified 3.7 years ago • written 3.7 years ago by Jennifer Hillman Jackson ♦ 25k

Similar posts • Search »