Question: BAM header using BWA-MEM in galaxy compared with GATK resource bundle (hg38) - deprecated tools in Galaxy
0
gravatar for Joan Gibert
8 months ago by
Joan Gibert20
Barcelona/PRBB
Joan Gibert20 wrote:

Hi!

I am trying to perform some analysis using GATK in order to identify mutations. Long sotry short, I performed BWA-MEM using hg38 in Galaxy and now I'm using hg38 provided in the GATK resource bundle (https://software.broadinstitute.org/gatk/download/bundle). The problem is that the header for the bam files generated from the BWA-MEM in Galaxy are different from the GATK reference genome fasta header (Homo_sapiens_assembly38.fasta).

Is there any way to solve this without remapping with GATK reference genome? Could I somehow edit the BAM header to match the GATK one?

Thanks! Joan

alignment bwa snp gatk deprecated • 391 views
ADD COMMENTlink modified 8 months ago by Jennifer Hillman Jackson25k • written 8 months ago by Joan Gibert20
1
gravatar for Jennifer Hillman Jackson
8 months ago by
United States
Jennifer Hillman Jackson25k wrote:

Hello,

All GATK tools hosted at public servers and available in the Tool Shed have been deprecated. Details: https://biostar.usegalaxy.org/p/26856/

The genome version in the GATK resource bundle is different than the hg38 version from UCSC. The chromosome identifiers are the primary difference. Sort order is another. Both are important or GATK tools will report errors.

Best option:

Other options (you'll have to decide which to try and work out the details - none are recommended/supported - but this is your choice):

  • Map in Galaxy against the hg_g1k_v37 human genome still available at https://usegalaxy.org. This is a match for the GATK bundle based hg19 (version 1.4). Data could then be partially processed in Galaxy (mapping). For GATK steps, avoid the Galaxy wrapped GATK tools - some will fail and all are based on earlier versions (1.4 and 2.x, where 3.x is the current GATK release).
  • Use the GATK version of the hg38 genome from the GATK bundle for mapping/GATK (both used line command).
  • I would NOT suggest using the GATK resource bundle's genome in Galaxy as a custom genome. It is too large and will cause other problems (memory failures for jobs).
  • You could try to modify the BAM chromosome names/headers, but this doesn't always work. See this tutorial (a bit outdated since the tools are no longer supported/deprecated, but the help for formatting is current): https://biostar.usegalaxy.org/p/14777/
  • And this is the help for chromosome naming mismatches that includes advice for modifying inputs to match up. https://galaxyproject.org/support/chrom-identifiers/. The sort order will still be important to get right if using GATK tools line-command downstream and just adding "chr" to the identifiers (header and data lines) is not enough. You can compare these yourself to see the differences - note that chrM maps to MT and "supercontigs" versus "haplotypes/unknown" so both will need more complex mapping/renaming/sorting.

Hope that helps with the alternatives to try out! Jen, Galaxy team

ADD COMMENTlink modified 8 months ago • written 8 months ago by Jennifer Hillman Jackson25k

Hi Jen!

Thank you for the great answer! I supposed that change the header would be a difficult task, I'm on a hurry right now so I guess that I will skip the steps which require header comparisons.

A pity that nobody scripted anything about changing headers/names on this different hg38 reference genomes, guess is not a big issue. Just curious, why are they so different? Do they output different results?

Thanks! Joan

ADD REPLYlink written 8 months ago by Joan Gibert20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 178 users visited in the last hour