Genome source from history in SnpEff

Question: Genome source from history in SnpEff

2.8 years ago by

k2low • 0

k2low • 0 wrote:

Hello, I am using SnpEff in Galaxy to add variant information to my vcf file. Please let me ask 2 questions about "Genome source" in SnpEff.

1. I successfully downloaded GRCh38.76 database for SnpEff to my history using SnpEff Download (green in my history), and tried to use it at SnpEff. I thought the database in my history would have shown up when I had selected "Reference genome from your history" at Genome source. But the column remained "No senpeffdb dataset available". Would tell me what I am wrong?

2. I was able to use SnpEff using "Named on demand" instead of the "Reference genome from your history". I wonder what is the advantage to download the database to the history.

Thanks in advance!

Kei

snpeff galaxy • 935 views

ADD COMMENT • link •

modified 2.8 years ago by Jennifer Hillman Jackson ♦ 25k • written 2.8 years ago by k2low • 0

2.8 years ago by

Jennifer Hillman Jackson ♦ 25k

United States

Jennifer Hillman Jackson ♦ 25k wrote:

Hello,

The option for downloading the genome into the history is intended to reduce load. Download it once, then reuse. However, this usage seems to be problematic (feedback from our team is pending). I suspect there is an issue with the dbkey assignment between the inputs (meaning, the database assignments for "dbkey" are a mismatch). SnpEff genomes include incremental versions in the key, while native genomes at http://usegalaxy.org do not plus are often named in a different way (using UCSC identifiers, etc). It may be possible to create a custom reference genome "build" that uses the exact same dbkey as SnpEff (database attribute) that would allow this option to work, but that has not been tested by me and seems tedious for large genomes. It could also trigger a memory problem, since that custom reference genome would need to be used for all steps in the analysis (not just the SnpEff annotation step).

Using the option "Named on demand" does seem to functions in small tests. Although I should mention that certain tool options on the tool form are passed to the command-line in deprecated format in many cases. This issue can be tracked here (and may not be the root issue): https://github.com/jennaj/support-known-issues/wiki

I suggest using the "Named on demand" option when working on http://usegalaxy.org. If working on your own local/cloud, then the native genome indexes could be created in a way that the dbkey is a match for the SnpEff genome dbkeys, and tested. Problems can be reported to the tool authors through the Tool Shed (http://usegalaxy.org/toolshed) or in Github.

If our team has more feedback, we will post an update.

Sorry for the confusion in usage, Jen, Galaxy team

ADD COMMENT • link written 2.8 years ago by Jennifer Hillman Jackson ♦ 25k

Similar posts • Search »