Question: Whole Exome Sequencing
gravatar for mia.khanii
3.7 years ago by
United States
mia.khanii10 wrote:

I recently got on a project that is attempting to find novel mutations in certain diseases by comparing genomic data between diseased tissue and blood. After sending our samples off to get sequenced, we received BAM files that are already pre aligned to Hg19. I am trying to use the following protocol from a previous study that was conducted with the same information:

Prevariant processing:

  1. Paired end reads were gap aligned to hg19 using BWA (Burrows–Wheeler Aligner).
  2. Poorly aligned/mapped reads were filtered away with Samtools; SAM to BAM conversion was done.
  3. PCR duplicates were marked and removed with the Picard package.
  4. Indel realignment with known sites and base quality score recalibration were performed with GATK (Genome Analysis Toolkit), in line with current best practices in the next-generation sequencing field for variant detection, to produce variant-caller ready reads in BAM format

Variant calling: identification of somatic substitutions and short indels

  1. Somatic single-nucleotide variants were called with MuTect (beta) (
  2. Single-nucleotide variants reported in dbSNP129 (the last accepted “pure” version of dbSNP) were removed, unless they were also present in COSMICv56 (Forbes et al., 2008).
  3. Somatic short indels were called with the Somatic Indel Detector walker that is part of the GATK package ( Both these programs take in paired tumor-normal BAMs as input.
  4. The single-nucleotide variants and indels were annotated with Oncotator (, a rapid and accurate web-based annotation tool.

I'm unable to complete step 4 of the prevariant processing, where I remove Indels using GATK on Galaxy. I'm unable to use a reference genome, even though i've tried uploading Hg19.fasta. Can anyone help me figure out what's going wrong? Thanks!

bam • 1.7k views
ADD COMMENTlink modified 2.2 years ago by marcocassone0 • written 3.7 years ago by mia.khanii10
gravatar for Jennifer Hillman Jackson
3.7 years ago by
United States
Jennifer Hillman Jackson25k wrote:


GATK tools on the public Main Galaxy instance are indexed natively using the same human reference genome released with the GATK-bundle: 1000 Genomes version "hg_g1k_b37". To use other reference genomes, create a Custom Reference genome. 

Please be aware the using hg19 has been known in the past to sometimes exceed the data processing resources needed by GATK tools when using the public Main instance at If this should occur, moving to a local or cloud Galaxy instance where you can dedicate more resource is best.

Section 2.14 = Custom Reference Genomes and Builds
Section 2.8.1-4 = Job failures and reasons, plus options for going forward

Thanks, Jen, Galaxy team 

ADD COMMENTlink written 3.7 years ago by Jennifer Hillman Jackson25k
gravatar for marcocassone
2.2 years ago by
marcocassone0 wrote:

excuse me i am new here now i am starting to use bioinformatic tools for WGS and WES. How can I filter variants on the base the frequency allele of 1000 genome?

ADD COMMENTlink written 2.2 years ago by marcocassone0

Please ask a new question :)

ADD REPLYlink written 2.2 years ago by Bjoern Gruening5.1k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 175 users visited in the last hour