Question: GATK indel realigner using custom reference
0
gravatar for laura
3.7 years ago by
laura0
laura0 wrote:

I've been using some SAM (converted to BAM) files and some partial genome assemblies (custom reference sequences) to investigate variants in bulk sequences. GATK's rmdup and Realigner Target Creator had no problem with the files, but when I used them and the output of Realigner Target Creator in Indel Realigner, I got

ERROR MESSAGE: Badly formed genome loc: Parameters to GenomeLocParser are incorrect:Unknown contig . . . 

Additional output:

[bam_header_read] EOF marker is absent. The input is probably truncated.

The problem appears to be with the final contig in the reference sequences.

How do I correct this problem?

Thanks,

Laura

ADD COMMENTlink modified 20 months ago by vignes.gangboard0 • written 3.7 years ago by laura0
0
gravatar for Jennifer Hillman Jackson
3.7 years ago by
United States
Jennifer Hillman Jackson24k wrote:

Hello,

Format/loading corrections will probably resolve the issue, as most errors are linked to one of the reasons below, for any tool. The public Main Galaxy server at http://usegalaxy.org did have some downtime much earlier today, potentially in the time frame your job was running. So if that is where you are working, a straight re-run to rule out an associated job failure could be considered.

Custom reference genome troubleshooting help is here in our wiki. I am not certain from this error if that is where the problem is (GATK errors can result for a variety of reasons), but double checking fasta format is a good place to start.
http://wiki.galaxyproject.org/Support#Custom_reference_genome
http://wiki.galaxyproject.org/Learn/CustomGenomes#Troubleshooting

The BAM EOF errors can reflect a problem with incomplete data loading (but not always). Still, if you are incorporating a newly uploaded file, use FTP and confirm a successful load.
http://wiki.galaxyproject.org/Support#Loading_data

And just in case this is a factor, make certain that all input data is a match for the the genome used. In particular, the chromosome identifiers must be an exact match in order for tools to function correctly (any that make use of a reference genome, custom or native to the instance). Odd errors or results can be produced when there is a mismatch.
http://wiki.galaxyproject.org/Support#Reference_genomes

Best, Jen, Galaxy team

ADD COMMENTlink written 3.7 years ago by Jennifer Hillman Jackson24k
0
gravatar for laura
3.7 years ago by
laura0
laura0 wrote:

Thanks, Jen. I vaguely remembered that the reference needed something done to it, but was perplexed that the problem only surfaced with the downstream tool. I'll try sorting.

Laura

ADD COMMENTlink written 3.7 years ago by laura0

Hi Laura, Sorting would be important. You may know this but just in case: when using GATK, follow their tool-specific sort practices described in the FAQ: http://www.broadinstitute.org/gatk/guide/article?id=1204 The GATK forum is linked from the GATK tool forms, but I'll add a new comment to our wiki as well. Jen, Galaxy team

 

ADD REPLYlink written 3.7 years ago by Jennifer Hillman Jackson24k
0
gravatar for vignes.gangboard
20 months ago by
vignes.gangboard0 wrote:

Determining (small) suspicious intervals which are likely in need of realignment (see the RealignerTargetCreator tool) Running the realigner over those intervals

ADD COMMENTlink written 20 months ago by vignes.gangboard0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 133 users visited in the last hour