Question: Genome build accession for Galaxy build-in genomes
2
gravatar for jirin
3.7 years ago by
jirin30
United States
jirin30 wrote:

How can I find exact genome build (NCBI accession number, like GCA_000001405.17) for build-in genomes on Galaxy main web site? When I choose the aligner (for example Bowtie2) only the version of the genome, like Human (Homo sapiens) (b38): hg38, is listed.

How important is the exact match between genome build (not genome version) used as reference for alignment and genome build of GTF/GFF3 file used downstream of the alignment?

Thank you,

Jiri

alignment • 1.2k views
ADD COMMENTlink modified 3.7 years ago by Jennifer Hillman Jackson25k • written 3.7 years ago by jirin30
0
gravatar for Jennifer Hillman Jackson
3.7 years ago by
United States
Jennifer Hillman Jackson25k wrote:

Hello,

The source for many genomes is explicitly listed under the "Builds" list along with the database name and either common name or genus-species. If the build has a public name in the "dbkey", it is in content identical to the build from the source. This can be found in two places:

  • The Upload tool form
  • The "Edit Attributes" form reached by clicking on the pencil icon for any dataset

Some may only contain the common build name, but these are when the source is from a public repository with well known identifiers. (Not that everyone will recognize them instantly, but they can be googled and such). For example, "hg38" is a UCSC (http://genome.ucsc.edu) build. These external data providers provide, as part of their deployments, extensive information about how the build was constructed, when, and the source details. 

If a genome is ever loaded into an instance through a Data Manager, more information is available about the processing. But this is a relatively new tool set and the data on Main contains both legacy and new genomes. Still, we are looking into ways of using this tool set for new genomes added in and also exposing the source and data processing details in an easy way for users to access (the Galaxy UI directly, or an external but related location, like our wiki).

More about reference genomes is here in our wiki:
http://wiki.galaxyproject.org/Support#Reference_genomes

If you have any other genomes that you are unable to identify the source of, please feel free to ask. Perhaps start each off as a distinct question, so that they can be tagged and easily found by others.

Thanks! Jen, Galaxy team

ADD COMMENTlink modified 3.7 years ago • written 3.7 years ago by Jennifer Hillman Jackson25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 168 users visited in the last hour