Question: Unified Genotyper error
1
gravatar for viktoria.evdokimova
6 weeks ago by
viktoria.evdokimova30 wrote:

Hello, everyone: I am building workflow for SNV on whole genome sequences. I was following best-practice for GATK. However, I am new to the field and have problem with it. I have error when I used input file created by GALAXY after steps such as map to reference, duplicate call and indel realign. The file is BAM and sequences were sorted by coordinates. Reading error description left me clueless because it is too much information in the debugger window. Could anyone to guide me through the troubleshooting and fixing the issue? Thank you in advance. PS. I am reading materials listed in the debugger and have no clarity what to do.

ERROR A USER ERROR has occurred (version exported):
ERROR The invalid arguments or inputs must be corrected before the GATK can proceed
ERROR Please do not post this error to the GATK forum
ERROR
ERROR See the documentation (rerun with -h) for this tool to view allowable command-line arguments.
ERROR Visit our wiki for extensive documentation http://www.broadinstitute.org/gsa/wiki
ERROR Visit our forum to view answers to commonly asked questions http://getsatisfaction.com/gsa
ERROR
ERROR MESSAGE: Lexicographically sorted human genome sequence detected in reads.
ERROR For safety's sake the GATK requires human contigs in karyotypic order: 1, 2, ..., 10, 11, ..., 20, 21, 22, X, Y with M either leading or trailing these contigs.
ERROR This is because all distributed GATK resources are sorted in karyotypic order, and your processing will fail when you need to use these files.
ERROR You can use the ReorderSam utility to fix this problem:
ADD COMMENTlink modified 6 weeks ago by Jennifer Hillman Jackson23k • written 6 weeks ago by viktoria.evdokimova30
1
gravatar for Jennifer Hillman Jackson
6 weeks ago by
United States
Jennifer Hillman Jackson23k wrote:

Hello,

As described in the error and at the GATK website resources, chromosome order is important. This is true whether working in Galaxy or line-command. My guess is that the order of some of your inputs follow the ordering guidelines but other inputs do not. All must be sorted - this includes the reference genome used in alignment and other steps, reference annotation data, aligned data (BAMs).

FAQs:

Small warning: the Galaxy wrapped GATK tools are at least one version old (in the Tool Shed) and even older in the deprecated set hosted at https://usegalaxy.org. These are not recommended. Changes in licensing may mean that a decision in the future to update the tools is a possible, but nothing is certain. For now, see the alternative variant analysis tools in the tutorials.

Galaxy tutorials:

Thanks! Jen, Galaxy team

ADD COMMENTlink modified 6 weeks ago • written 6 weeks ago by Jennifer Hillman Jackson23k
1
gravatar for viktoria.evdokimova
6 weeks ago by
viktoria.evdokimova30 wrote:

Thank you for prompt reply. I see what you are talking about. Please, correct if I am wrong. I should sort in chromosome order not only files I am analyzing but all files used in the tool, including reference sequences in FASTA and BED formats. Did I get it right?

ADD COMMENTlink written 6 weeks ago by viktoria.evdokimova30

Yes, all should be the same. Consistent sort order in all inputs is required. You may need to recreate your custom genome, then start over from mapping.

This situation is somewhat common when going through these tools the first time. The prepped data from GATK is already formatted this way but doesn't cover all genomes.

ADD REPLYlink written 6 weeks ago by Jennifer Hillman Jackson23k

Great, thank you very much!

ADD REPLYlink written 6 weeks ago by viktoria.evdokimova30
1
gravatar for viktoria.evdokimova
6 weeks ago by
viktoria.evdokimova30 wrote:

In addition, thank you for the tip on alternative tools. I am going to try them.

ADD COMMENTlink written 6 weeks ago by viktoria.evdokimova30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 133 users visited in the last hour