Whole Exome Sequencing

Question: Whole Exome Sequencing

3.7 years ago by

United States

mia.khanii • 10 wrote:

I recently got on a project that is attempting to find novel mutations in certain diseases by comparing genomic data between diseased tissue and blood. After sending our samples off to get sequenced, we received BAM files that are already pre aligned to Hg19. I am trying to use the following protocol from a previous study that was conducted with the same information:

Prevariant processing:

Paired end reads were gap aligned to hg19 using BWA (Burrows–Wheeler Aligner).
Poorly aligned/mapped reads were filtered away with Samtools; SAM to BAM conversion was done.
PCR duplicates were marked and removed with the Picard package.
Indel realignment with known sites and base quality score recalibration were performed with GATK (Genome Analysis Toolkit), in line with current best practices in the next-generation sequencing field for variant detection, to produce variant-caller ready reads in BAM format

Variant calling: identification of somatic substitutions and short indels

Somatic single-nucleotide variants were called with MuTect (beta) (https://confluence.broadinstitute.org/display/CGATools/MuTect).
Single-nucleotide variants reported in dbSNP129 (the last accepted “pure” version of dbSNP) were removed, unless they were also present in COSMICv56 (Forbes et al., 2008).
Somatic short indels were called with the Somatic Indel Detector walker that is part of the GATK package (https://confluence.broadinstitute.org/display/CGATools/Indelocator). Both these programs take in paired tumor-normal BAMs as input.
The single-nucleotide variants and indels were annotated with Oncotator (http://www.broadinstitute.org/oncotator/), a rapid and accurate web-based annotation tool.

I'm unable to complete step 4 of the prevariant processing, where I remove Indels using GATK on Galaxy. I'm unable to use a reference genome, even though i've tried uploading Hg19.fasta. Can anyone help me figure out what's going wrong? Thanks!

bam • 1.7k views

ADD COMMENT • link •

modified 2.2 years ago by marcocassone • 0 • written 3.7 years ago by mia.khanii • 10

3.7 years ago by

Jennifer Hillman Jackson ♦ 25k

United States

Jennifer Hillman Jackson ♦ 25k wrote:

Hello,

GATK tools on the public Main Galaxy instance are indexed natively using the same human reference genome released with the GATK-bundle: 1000 Genomes version "hg_g1k_b37". To use other reference genomes, create a Custom Reference genome.

Please be aware the using hg19 has been known in the past to sometimes exceed the data processing resources needed by GATK tools when using the public Main instance at http://usegalaxy.org. If this should occur, moving to a local or cloud Galaxy instance where you can dedicate more resource is best.

http://wiki.galaxyproject.org/Support

Section 2.14 = Custom Reference Genomes and Builds
Section 2.8.1-4 = Job failures and reasons, plus options for going forward

Thanks, Jen, Galaxy team

ADD COMMENT • link written 3.7 years ago by Jennifer Hillman Jackson ♦ 25k

2.2 years ago by

marcocassone • 0

marcocassone • 0 wrote:

excuse me i am new here now i am starting to use bioinformatic tools for WGS and WES. How can I filter variants on the base the frequency allele of 1000 genome?

ADD COMMENT • link written 2.2 years ago by marcocassone • 0

Please ask a new question :)

ADD REPLY • link written 2.2 years ago by Bjoern Gruening ♦ 5.1k

Please log in to add an answer.

Similar posts • Search »