6 months ago by
All GATK tools hosted at public servers and available in the Tool Shed have been deprecated. Details: https://biostar.usegalaxy.org/p/26856/
The genome version in the GATK resource bundle is different than the hg38 version from UCSC. The chromosome identifiers are the primary difference. Sort order is another. Both are important or GATK tools will report errors.
Other options (you'll have to decide which to try and work out the details - none are recommended/supported - but this is your choice):
- Map in Galaxy against the
hg_g1k_v37 human genome still available at https://usegalaxy.org. This is a match for the GATK bundle based hg19 (version 1.4). Data could then be partially processed in Galaxy (mapping). For GATK steps, avoid the Galaxy wrapped GATK tools - some will fail and all are based on earlier versions (1.4 and 2.x, where 3.x is the current GATK release).
- Use the GATK version of the hg38 genome from the GATK bundle for mapping/GATK (both used line command).
- I would NOT suggest using the GATK resource bundle's genome in Galaxy as a custom genome. It is too large and will cause other problems (memory failures for jobs).
- You could try to modify the BAM chromosome names/headers, but this doesn't always work. See this tutorial (a bit outdated since the tools are no longer supported/deprecated, but the help for formatting is current): https://biostar.usegalaxy.org/p/14777/
- And this is the help for chromosome naming mismatches that includes advice for modifying inputs to match up. https://galaxyproject.org/support/chrom-identifiers/. The sort order will still be important to get right if using GATK tools line-command downstream and just adding "chr" to the identifiers (header and data lines) is not enough. You can compare these yourself to see the differences - note that chrM maps to MT and "supercontigs" versus "haplotypes/unknown" so both will need more complex mapping/renaming/sorting.
Hope that helps with the alternatives to try out! Jen, Galaxy team