Dear All I have a vcf file generated through happlotype caller in gatk. The chromosome and position names are very different in it like gi|996703411|ref|NW_015379183.1| Kindly somebody help me in this regard how to fix it or replace with normal positions and chromosome names. Its a rice genome data aligned with IRGSP 1.0 reference genome.
Hello,
All GATK tool wrappers, whether from the Tool shed installed in a local, or hosted on a public Galaxy server, are considered deprecated.
That said, the issue has to do with a mismatch between the inputs. The exact same reference genome must be used throughout the analysis. The format of the Custom genome is very important. The identifiers must be the same between all inputs (genome, annotation, mapping results, etc) and - especially for GATK - in a specific order. It is much easier to format the CG fasta correctly from the start so it can be used during the mapping and later steps without needing to "fix" anything.
You probably want this type of header to match other data inputs:
>NW_015379183.1
Instead of this:
>gi|996703411|ref|NW_015379183.1|
Help:
- Tutorial: Fasta Format, Custom Genomes, and GATK Chromosome ordering https://biostar.usegalaxy.org/p/14777/
- See Chrom mismatch and Custom genome FAQs here: https://galaxyproject.org/support/#getting-inputs-right
Thanks, Jen, Galaxy team