ANNOVAR v0.2 on local server has no reference databases

Question: ANNOVAR v0.2 on local server has no reference databases

10 months ago by

mcrabtree • 10 wrote:

Hello,

I recently used "ANNOVAR Annotate VCF with functional information using ANNOVAR (Galaxy Version 0.2)" on usegalaxy.org using the following settings:

For gene annotations, I entered: refGene, gencodev19 For annotation regions, I entered: GenomicsSuperDups, phastConsElements46way For annotation databases, I entered: 1000g2012_apr_all, avsift, snp137nonflagged, esp6500si_all, snp137, cosmic67 For output data type, I selected tabular.

These settings successfully produced an annotated VCF file. However, when I transferred this tool to our local server running Galaxy, it no longer works, saying that there are "no options available" for these databases, and it doesn't allow me to type them in. I am still able to run the tool 'successfully' (i.e. producing a green entry on the history), but all that is returned is an empty VCF file with no annotations. Why is this happening? I looked into ways to download those databases onto our server, but was unable to figure out how to do that, nor do I know if that is even the right approach to solving this problem. Any advice is appreciated.

Thanks, Matthew

reference data annovar local index • 668 views

ADD COMMENT • link •

modified 10 months ago • written 10 months ago by mcrabtree • 10

10 months ago by

Jennifer Hillman Jackson ♦ 25k

United States

Jennifer Hillman Jackson ♦ 25k wrote:

Hello,

The reference data for Annovar needs to be configured in your local.

Instructions for setting up the reference data for this particular tool are in the tool repository's readme file here: https://toolshed.g2.bx.psu.edu/view/devteam/table_annovar/08b003ee9db7

Many tools have a simpler method to set up reference data and genomes using Data Managers. You probably want to at least Fetch and then index for Sam/Picard the target reference genome that the Annovar annotation is based on. https://galaxyproject.org/admin/tools/data-managers/. Search in the Tool Shed for "Data Managers" to see which tools have this type of data installer. https://toolshed.g2.bx.psu.edu/

No actual genomes or indexes are in the Galaxy distribution itself, just the labels ("database") to help keep nomenclature standardized across instances for common genomes (but that content is optional - you could remove it all and start over with custom labels, or just add in new ones).

Hope that helps!

ADD COMMENT • link written 10 months ago by Jennifer Hillman Jackson ♦ 25k

Thanks for your response. I have made some progress, but I'm not quite there yet. Here are where things stand --

I successfully downloaded some databases using annotate_variation.pl -downdb -buildver <build> [-webfrom annovar] <database> <humandb>. I also located the tool-data/annovar.loc file and updated it as follows:

hg19 hg19 hg19 [Human Feb. 2009 (GRCh37/hg19)] /export/annovar/annovar /export/annovar/annovar/humandb

snp137 snp137 snp137 /export/annovar/annovar /export/annobar/annovar/humandb

Then I restarted the docker and checked the galaxy user interface to observe the changes. On the tool called "ANNOVAR Annotate a file using ANNOVAR (Galaxy Version 2016march)", under the "Reference" drop down, I am now able to see these new entries, both hg19 [Human Feb. 2009 (GRCh37/hg19)] and snp137. This is great, however, using the tool called "ANNOVAR Annotate VCF with functional information using ANNOVAR (Galaxy Version 0.2)", it still remains the case that "no options are available" under each of the three sections (Gene Annotations, Annotation Regions, Annotation Databases).

What do I need to do in order for the databases that I download to be visible to the VCF annotation tool?

Thanks,

Matthew

ADD REPLY • link written 10 months ago by mcrabtree • 10

Have you installed the hg19 genome with the Fetch fasta Data Manager?

ADD REPLY • link written 10 months ago by Jennifer Hillman Jackson ♦ 25k

Yes, I have fetched the FASTA file for GRCh37, which retrieved 3,348 sequences. Beyond that, I'm not sure how to use that file, or if I need to reference it somewhere.

For the annovar.loc file, I have entered the following:

hg19 hg19 hg19 [Human Feb. 2009 (GRCh37/hg19)] /export/annovar/annovar /export/annovar/annovar/humandb

refGene hg19 gene_ann /export/annovar/annovar /export/annovar/annovar/humandb

gencodedeV19 hg19 gene_ann /export/annovar/annovar /export/annovar/annovar/humandb

genomicSuperDups hg19 region /export/annovar/annovar /export/annovar/annovar/humandb

phastConsElements46way hg19 region /export/annovar/annovar /export/annovar/annovar/humandb

1000g2012apr_all hg19 filter /export/annovar/annovar /export/annovar/annovar/humandb

avsift hg19 filter /export/annovar/annovar /export/annovar/annovar/humandb

snp137NonFlagged hg19 filter /export/annovar/annovar /export/annovar/annovar/humandb

esp6500si_all hg19 filter /export/annovar/annovar /export/annovar/annovar/humandb

snp137 hg19 filter /export/annovar/annovar /export/annovar/annovar/humandb

cosmic64 hg19 filter /export/annovar/annovar /export/annovar/annovar/humandb

I am still not able to call up the gene annotation, annotation regions, or annotation databases on the local galaxy user interface.

ADD REPLY • link modified 10 months ago • written 10 months ago by mcrabtree • 10

Did you fetch the genome from UCSC directly? And give it the database name hg19.

The data that tool sets up is contained in other .loc files (and data tables). The database names need to match up exactly or the links between the genome and annotation files won't work.

Click under the admin Local data link, all .loc file/data table contents are listed. You should find hg19 in the all_fasta and fasta_indexes tables. What are the contents for hg19 (just to double check).

Other troubleshooting for other data created with a DM often involves checking for:

Extra spaces or tabs or blank lines in the .loc file. You want each column of data to be separated by a single table.
Restart Galaxy after any changes.

Let us know, thanks!

ADD REPLY • link written 10 months ago by Jennifer Hillman Jackson ♦ 25k

I fetched the genome from NCBI directly using the "Retrieve FASTA from NCBI (Galaxy Version 1.1.0)" tool. Here's what I see under admin --> local data --> all fasta:

[value] [dbkey] [name] [path]

[hg19] [hg19] [Human (Homo sapiens) (b37): hg19] [/galaxy/data/hg19/seq/hg19.fa]

Regarding the annovar VCF annotation databases, I noticed that there were multiple annovar.loc files on our server, and after identifying the red herring files and updating the appropriate annovar.loc, I'm pleased to say that the annotation databases are now visible on the annovar VCF tool! (I was, however, unable to download gencodeV19 and phastConsElements46way... will try again later).

The issue that I'm running into now is that the convert2annovar.pl command is not found. The exact error message is:

Log: tool progress

/export/galaxy-central/database/job_working_directory/000/97/tool_script.sh:

line 9: convert2annovar.pl: command not found

/export/galaxy-central/database/job_working_directory/000/97/tool_script.sh:

line 9: table_an

The tool's readme file says to "add the annovar scripts convert2annovar.pl and table_annovar.pl to your Galaxy user's path". I am currently trying to figure out how to do this.

ADD REPLY • link modified 10 months ago • written 10 months ago by mcrabtree • 10

I am pretty sure this genome should be fetched directly from UCSC in order to be a match for the annovar version of hg19. Otherwise, there can be a chromosome naming mismatch problems.

This FAQ is for making sure that inputs (on the tool form) match each other and the indexed fasta genome databases on a server, but the main points apply for your use case. https://galaxyproject.org/support/chrom-identifiers/

In short, compare the chromosome identifiers in the genome index (the fasta file) with those in the annovar reference data.

The identifiers should be formatted like chr1, chr2, chr3, .. chrX, chrY, chrM
Many other sources leave off the "chr", creating a mismatch 1, 2, 3, .. X, Y, M (or MT)

ADD REPLY • link modified 9 months ago • written 9 months ago by Jennifer Hillman Jackson ♦ 25k

Similar posts • Search »