Question: error using variant recalibrator in galaxy
0
gravatar for roisinmcallister
3.8 years ago by
United Kingdom
roisinmcallister0 wrote:

 


 

Tool name: Variant Recalibrator
Tool version: 0.0.4
Tool ID: toolshed.g2.bx.psu.edu/repos/devteam/variant_recalibrator/gatk_variant_recalibrator/0.0.4
ToolShed URL: https://toolshed.g2.bx.psu.edu/view/devteam/variant_recalibrator

I have been trying to implement the GATK best practice protocol (Geraldine A. Van der Auwera et al.,
Curr Protoc Bioinformatics. ; 11(1110): 11.10.1–11.10.33. doi:10.1002/0471250953.bi1110s43) using the main Galaxy server; since I am more comfortable using Galaxy.

I got as far as getting a raw (SNP) variant.vcf from Unified Genotyper (HC is not available in Galaxy), using the realigned and BQSRed BAM. The annotations used in the subsequent Variant Recalibration instructions are included in the raw_SNP.vcf. However, when I try to use Variant Recalibrator, I consistently get the following error:

An error occurred with this dataset:

Picked up _JAVA_OPTIONS: -Djava.io.tmpdir=/galaxy-repl/main/scratch [Sun Feb 01 14:43:22 CST 2015] net.sf.picard.sam.CreateSequenceDictionary REFERENCE=/galaxy-repl/main/scratch/tmp-gatk-74BJCX/gatk_input.fasta OUTPUT=/galaxy-repl/main/scratch/tmp-gatk-7

I am using the GATK ucsc.hg19.fasta (from Galaxy shared data) , my input raw_SNP.vcf and.................................

I have tried using the hapmap_3.3.hg19.vcf (both downloaded from the GATK Galaxy shared data and the one from the GATK bundle, on their FTP site); 1000gomni2.5hg19.vcf (both downloaded from the GATK Galaxy shared data and the one from the GATK bundle, on their FTP site); dbSNP_135.hg19.vcf(both downloaded from the GATK Galaxy shared data and the one from the GATK bundle, on their FTP site) and the 1000g-phase1.snps.highconfidence.hg19.vcf (from GATK bundle, FTP site).I have tried to enter the known, training, truth details both explicitly (and not).

I have checked the following annotations (that are present ONLY in the raw_SNP.vcf file; not the ROD files- although I don't see how I can annotate them without their respective BAM files?)

DP

QD

FS

MQRankSum

ReadPosRankSum

 I have tried to leave the GATK and analysis tabs at "Basic". The only difference here (I can see) is that the paper suggests 0.01 percentBad and the Galaxy default is 0.03 (I have tried both though)

Any help would be appreciated

DoireRosie

 

variant-recalibrator • 1.6k views
ADD COMMENTlink modified 3.8 years ago • written 3.8 years ago by roisinmcallister0
0
gravatar for Jennifer Hillman Jackson
3.8 years ago by
United States
Jennifer Hillman Jackson25k wrote:

Hello,

If you are running the tool on Main, then you are also using the hg19 genome as a Custom Reference Genome/Build with the tool? This means that the form has this options set as "Choose the source for the reference list: History"? Following these instructions?
http://wiki.galaxyproject.org/Support#Custom_reference_genome

If not, then there could be a genomic mismatch problem with the inputs. This tool functions with the 1000 genomes build "1h_g1k_b37" using default settings on the public Main Galaxy instance at http://usegalaxy.org.
http://wiki.galaxyproject.org/Support#Reference_genomes

You can also check the end of the error message - GATK tools report the exact problem in most cases, or an ancillary problem that will lead to the root problem.

Thanks, Jen, Galaxy team

 

 

ADD COMMENTlink written 3.8 years ago by Jennifer Hillman Jackson25k
0
gravatar for roisinmcallister
3.8 years ago by
United Kingdom
roisinmcallister0 wrote:

Thanks Jennifer

Sorry, yes I should have said that I am using hg19 as a Custom Reference (as per the Galaxy instructions in the link above). I have used that reference in all the prevoius steps (alignmnent in BWA, dup removal in Picard, realignment and BQSR and in SNP calling with UG......all of which have been problem free).

I can see that, as I suspected, the training files are not annotated

ERROR MESSAGE: Bad input: Values for DepthOfCoverage annotation not detected for ANY training variant in the input callset. VariantAnnotator may be used to add these annotations

But, I can't see how I can add all of the above listed annotations without having the BAM files from which each reference.vcf was derived? And I can't locate such files?

DoireRosie

ADD COMMENTlink written 3.8 years ago by roisinmcallister0

The .vcf dataset does need to be annotated using the source BAM file for certain functions using this tool. Double check that the attributes you are choosing on the form match those in the available public .vcf dataset(s). If not, then you will not be able to use them, you can only work with what is provided. To start, try leaving out Depth of Coverage from the analysis (uncheck it on the tool form) and review the result.

Best, Jen, Galaxy team

ADD REPLYlink written 3.8 years ago by Jennifer Hillman Jackson25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 169 users visited in the last hour