Question: 'print reads' throwing 'eof marker is absent' on my BAM file
0
gravatar for PaulW
9 months ago by
PaulW60
PaulW60 wrote:

Yes I've read many of the posts out there about 'BAM EOF absent', but can't find a resolution.

I'm working through a DNA analysis. All going fine until I try to run print reads which throws: *[bam_header_read] EOF marker is absent. The input is probably truncated. [bam_header_read] invalid BAM binary header (this is not a BAM file). [bam_index_core] Invalid BAM header.[bam_index_build2] fail to index the BAM file. *

I've tried many variations including:

-replacing the BAM header with one from an earlier step in the history

-sorting the BAM into coordinate order

-converting to a SAM

-inspecting the header - looks OK as far as I can tell

It seems like only print reads complains about the BAM file.

Header follows. Any suggestions?


@HD VN:1.4 GO:none SO:coordinate

@SQ SN:chrM LN:16571 UR:file:/mnt/galaxy/tmp/job_working_directory/092/92145/working/localref.fa M5:d2ed829b8a1628d16cbeee88e88e39eb

..etc..

@SQ SN:chrUn_gl000249 LN:38502 M5:1d78abec37c15fe29a275eb08d5af236 UR:file:/mnt/galaxy/tmp/job_working_directory/092/92145/working/localref.fa

@RG ID:ID1 LB:LB1 PL:ILLUMINA SM:SM1 PU:PU1 @PG ID:bwa PN:bwa CL:/home/gpladmin/bfx/resources/tools/bwa/bwa mem -M -t 16 -R @RG\tID:WS\tSM:06135135\tLB:NEO1\tPL:illumina\tPU:AUCY8 /home/gpladmin/bfx/resources/fasta/hg19.fa /home/gpladmin/data/uploads/AGRF_CAGRF11631_AUCY8/06135135_AUCY8_AGGCAGAA-TATCCTCT_L001_R1.fastq.gz /home/gpladmin/data/uploads/AGRF_CAGRF11631_AUCY8/06135135_AUCY8_AGGCAGAA-TATCCTCT_L001_R2.fastq.gz VN:0.7.15-r1142-dirty

@PG ID:MarkDuplicates VN:1.136(f187319bf8bbde56892d5b5a1ce3fc0529b71a49_1436805856) CL:picard.sam.markduplicates.MarkDuplicates INPUT=[/mnt/galaxy/files/000/144/dataset_144572.dat] OUTPUT=/mnt/galaxy/files/000/144/dataset_144575.dat METRICS_FILE=/mnt/galaxy/files/000/144/dataset_144574.dat REMOVE_DUPLICATES=true ASSUME_SORTED=false DUPLICATE_SCORING_STRATEGY=SUM_OF_BASE_QUALITIES READ_NAME_REGEX=[a-zA-Z0-9]+:[0-9]:([0-9]+):([0-9]+):([0-9]+).*. OPTICAL_DUPLICATE_PIXEL_DISTANCE=100 VERBOSITY=ERROR QUIET=true VALIDATION_STRINGENCY=LENIENT MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP=50000 MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=8000 SORTING_COLLECTION_SIZE_RATIO=0.25 PROGRAM_RECORD_ID=MarkDuplicates PROGRAM_GROUP_NAME=MarkDuplicates COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json PN:MarkDuplicates PP:bwa

M00123:81:000000000-AUCY8:1:1109:19271:19160 163 chrM 707 60 150M = 723 166 CCCCATTCCAGTGAGTTCACCCTCTAAATCACCACGATCAAAAGGGACAAGCATCAAGCACGCAGCAATGCAGCTCAAAACGCTTAGCCTAGCCACACCCCCACGGGAAACAGCAGTGATTAACCTTTAGCAATAAACGAAAGTTTAACT CCDCDFFFFFFFGGGGGGGGGGHHHHHHHHHHHHHGGHHGHHGHGGHGGGHHHHHHHGHHHFGGGGHHHHHHHHHHHHHHHGGGGGHHHHHHHHHHHGGGGGGGGGGGGHHHHHHHHHBHHHHHHHHHHHHHHHHHHHGGGGGGHHHHHH MC:Z:150M MD:Z:4G145 PG:Z:MarkDuplicates RG:Z:ID1 NM:i:1 MQ:i:60 UQ:i:35 AS:i:145

print reads bam • 328 views
ADD COMMENTlink written 9 months ago by PaulW60
1
gravatar for Devon Ryan
9 months ago by
Devon Ryan1.8k
Germany
Devon Ryan1.8k wrote:

I suggest clicking on the "bug report" button on one of the history items. The most likely cause of this is that the underlying tool is returning a broken file. The error message you're seeing is probably from Galaxy attempting to index it.

What are you trying to have print reads do? There are likely other tools that won't produce this problem.

ADD COMMENTlink written 9 months ago by Devon Ryan1.8k
1

Yes, I was wondering if it was a spurious error.

I'm trying to apply the recalibration table I created with BaseRecalibrator to my BAM file.

ADD REPLYlink written 9 months ago by PaulW60
1

The GATK tools available in Galaxy are not the best practice and I would suggest that you not use them without a good reason. You'd be better served with freeBayes or samtools/bcftools.

ADD REPLYlink written 9 months ago by Devon Ryan1.8k

Indeed, "best practice" would be to use GATK 3.7. Galaxy servers seems to be dropping support for GATK. It's unclear whether this is driven by the Galaxy team or Broad Institute but it's been written that there are licensing issues.

The Institut Curie Galaxy server offers GATK 3. But there's one input screen to cover all the tools so some parameters are unavailable. When I tried to use it I got a spurious error in a tool (not Print Reads) referring to a dataset I have never used.

Perhaps you're right that Galaxy is a poor workflow environment for running GATK.

ADD REPLYlink modified 9 months ago • written 9 months ago by PaulW60
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 91 users visited in the last hour