BWA-MEM Fatal error: Matched on ERROR

Question: BWA-MEM Fatal error: Matched on ERROR

10 months ago by

Taiwan

mychung • 0 wrote:

Hi, I was aligning with bwa-mem on a 52.7GB trimmed paired fastq from WGS under the following conditions

Will you select a reference genome from your history or use a built-in index? cached
Using reference genome hg19
Single or Paired-end reads paired_iv
Select fastq dataset 1: F30_S11_L001_R1_001 (paired) trimmed (paired).fastq
Enter mean, standard deviation, max, and min for insert lengths. 475 Set read groups information? set_picard
Auto-assign true
Auto-assign true
Auto-assign true
Platform/technology used to produce the reads (PL) ILLUMINA.....

The run aborted about 4 hours later and showed the following error message:

Fatal error: Matched on ERROR: [main] Version: 0.7.15-r1140 [main] CMD: bwa mem -t 10 -v 1 -p -I 475 -R @RG\tID:F30_S11_L001_R1_001__paired__trimmed__paired_.fastq\tSM:F30_S11_L001_R1_001__paired__trimmed__paired_.fastq\tPL:ILLUMINA\tLB:F30_S11_L001_R1_0

What could be the source of error? Please help. Thank you!

software error alignment • 442 views

ADD COMMENT • link •

modified 10 months ago by Jennifer Hillman Jackson ♦ 25k • written 10 months ago by mychung • 0

10 months ago by

Jennifer Hillman Jackson ♦ 25k

United States

Jennifer Hillman Jackson ♦ 25k wrote:

Hello,

Very few tools accept paired-end interlaced fastq datasets. I see a successful BWA-MEM run now in your histories, but then there was a problem (error) with a downstream tool about an incomplete BAM result dataset. There could be a bug somewhere - we are running some tests and will report back with a ticket if that is true.

It would be easier to use two distinct paired-end datasets as input. All tools that accept paired-end input can work with data in that format. Specifically, use individual datasets or two collections (one containing the forward reads, the other reverse).

FAQs:

How to de-interlace a fastq dataset: https://galaxyproject.org/support >> https://galaxyproject.org/support/ncbi-sra-fastq/#interlaced-forward-and-reverse-reads

Hope that helps! Jen, Galaxy team

ADD COMMENT • link written 10 months ago by Jennifer Hillman Jackson ♦ 25k

Small update.

I took a closer look at the failed Picard job run after the last successful BWA-MEM job. From the error message (click on the bug icon to review), there is a warning about a conflict between the length of the input sequences and a setting. Please review for the details about how to take action. This should be adjusted (or ignored, your choice) given your sequence data content, whether entered as interlaced or two distinct inputs.
I also checked the BAM dataset produced by BWA-MEM. It is intact. The warning about a truncated file is just a warning and is related to the job failing before the processing for the Picard Mark Duplicates tool completed. It can be ignored and is not the reason for the failure.
Even though these Picard tool forms have the option to input unsorted data, and to have the tool do the sorting, this can create a very large job that can often fail (likely the root reason for your failure). Sorting first when using any Picard/SAMTools tools is strongly recommended. Coordinate-order sort is expected unless the downstream tool states to use queryname-order sort on the tool form (top entry area or lower help section).

I would suggest sorting the BAM dataset, adjusting the settings for Mark Duplicates, and trying a rerun if you still want to use interlaced fastq inputs. I didn't find any problems with either of these tools using that type of input when given a sorted BAM dataset.

FAQ for sorting: https://galaxyproject.org/support/ >> https://galaxyproject.org/support/sort-your-inputs/

Hope that helps!

ADD REPLY • link modified 10 months ago • written 10 months ago by Jennifer Hillman Jackson ♦ 25k

Hi, Jen

Thank you very much for the support and suggestions.

It's good to know that the BAM files are intact. As suggested, a BAM file was sorted before performing MarkDuplicatesWithMateCigar. Unfortunately, it failed again. The error message is as the following,

[bam_header_read] EOF marker is absent. The input is probably truncated. [bam_index_core] truncated file? Continue anyway. (-4) Fatal error: Exit code 1 () Picked up _JAVA_OPTIONS: -Djava.io.tmpdir=/galaxy-repl/main/jobdir/018/057/18057243/_job_tmp -Xmx7

At the same time, MarkDuplicates under PicardTool was performed using the unsorted BAM file and was successfully completed. Could the bug or error be related to the mate cigar or some other parameters different between these two functionalities?

Your assistance is greatly appreciated!

ADD REPLY • link written 10 months ago by mychung • 0

Hi Jen,

I noticed that the failed MarkDuplicates using sorted BAM may be due to errors in merged BAM regardless of validation stringency (Lenient or Silent). The following is an example of stderr under Lenient stringency.

How should SAM validation errors like this be fixed?

Please advise.

Thank you!

Picked up _JAVA_OPTIONS: -Djava.io.tmpdir=/galaxy-repl/main/jobdir/018/197/18197809/_job_tmp -Xmx7g -Xms256m Ignoring SAM validation error: ERROR: Record 65859373, Read name HISEQ:364:C4D2RACXX:6:1111:14675:30986, bin field of BAM record does not equal value computed based on alignment start and end, and length of sequence to which read is aligned Ignoring SAM validation error: ERROR: Record 97631064, Read name HISEQ:364:C4D2RACXX:6:2104:6692:19211, bin field of BAM record does not equal value computed based on alignment start and end, and length of sequence to which read is aligned

ADD REPLY • link written 10 months ago by mychung • 0

Similar posts • Search »