Question: GWAS: Fastq to BAM to VCF
gravatar for nvlinh.dth
2.7 years ago by
nvlinh.dth10 wrote:

I'm analyzing NGS data on in the following steps: 1. Fastq file Raw reads QC FASTQ Groomer FASTQ Trimmer

  1. Convert to Fastqsanger file

  2. Map to reference (Map with BWA for Illumina)

  3. Convert SAM file to BAM

  4. Mark to Duplicates (Picard tools) or make a VCF file.

  5. Analysis by GATK tools such as: Indel realignment, Base Recalibration, Variant Calling, variant Filtering, annotation (SnpEff)

I have a problem at step 5 , I don't know how to do. Would you like help me, please.

gwas variant vcf fastq bam • 2.4k views
ADD COMMENTlink modified 2.7 years ago by Jennifer Hillman Jackson25k • written 2.7 years ago by nvlinh.dth10
gravatar for Guy Reeves
2.7 years ago by
Guy Reeves1.0k
Guy Reeves1.0k wrote:

HI I would avoid using GATK unless you have very particular reasons to do so (particularly with the relatively old versions available on Galaxy).
This graphic give a nice idea of what else is possible on galaxy. A: Galaxy for GWAS, any tools avaliable These other approaches are much faster to run and to get setup (this is particularly true for beginners) Personally I use freebayes. Cheers Guy

ADD COMMENTlink written 2.7 years ago by Guy Reeves1.0k
gravatar for Jennifer Hillman Jackson
2.7 years ago by
United States
Jennifer Hillman Jackson25k wrote:


The tools on (Galaxy Main) have been deprecated. Most should still work but if server issues come up, we will not be fixing these going forward. I suggest reviewing the best practice GATK protocols at the Broad web site and picking the right path for your analysis goals.

I should let you know that the GATK tools on Main are indexed for the human genome hg_g1k_v37 and SnpEff is indexed on main for the human genome hg19. These are incompatible and using hg19 as a custom reference genome with the GATK tools will exceed compute resources for certain tools.

If you need to, or want to, the newer versions of the GATK pipeline tools are available in the Tool Shed for use in a local or cloud Galaxy. Please note that there is no data manager as these tools are not fully supported due to licensing restrictions, so install data will be manual (using the GATK resource bundle). SnpEff is also available in the Tool Shed. There is a data manager available for this tool.

There are other tool options for variant calling on Main, please see:


The second includes some tutorials for GATK, but many of these are a bit older or may use the newer version of the tools not found on Main. Because of this, I am not sure how many are still valid for use on Main. But others cover variant analysis that utilizes different tools, many of which are on Main, the rest would be in the Tool Shed.

Hopefully this helps, Jen, Galaxy team

ADD COMMENTlink written 2.7 years ago by Jennifer Hillman Jackson25k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 180 users visited in the last hour