Yes I've read many of the posts out there about 'BAM EOF absent', but can't find a resolution.
I'm working through a DNA analysis. All going fine until I try to run print reads which throws: *[bam_header_read] EOF marker is absent. The input is probably truncated. [bam_header_read] invalid BAM binary header (this is not a BAM file). [bam_index_core] Invalid BAM header.[bam_index_build2] fail to index the BAM file. *
I've tried many variations including:
-replacing the BAM header with one from an earlier step in the history
-sorting the BAM into coordinate order
-converting to a SAM
-inspecting the header - looks OK as far as I can tell
It seems like only print reads complains about the BAM file.
Header follows. Any suggestions?
@HD VN:1.4 GO:none SO:coordinate
@SQ SN:chrM LN:16571 UR:file:/mnt/galaxy/tmp/job_working_directory/092/92145/working/localref.fa M5:d2ed829b8a1628d16cbeee88e88e39eb
..etc..
@SQ SN:chrUn_gl000249 LN:38502 M5:1d78abec37c15fe29a275eb08d5af236 UR:file:/mnt/galaxy/tmp/job_working_directory/092/92145/working/localref.fa
@RG ID:ID1 LB:LB1 PL:ILLUMINA SM:SM1 PU:PU1 @PG ID:bwa PN:bwa CL:/home/gpladmin/bfx/resources/tools/bwa/bwa mem -M -t 16 -R @RG\tID:WS\tSM:06135135\tLB:NEO1\tPL:illumina\tPU:AUCY8 /home/gpladmin/bfx/resources/fasta/hg19.fa /home/gpladmin/data/uploads/AGRF_CAGRF11631_AUCY8/06135135_AUCY8_AGGCAGAA-TATCCTCT_L001_R1.fastq.gz /home/gpladmin/data/uploads/AGRF_CAGRF11631_AUCY8/06135135_AUCY8_AGGCAGAA-TATCCTCT_L001_R2.fastq.gz VN:0.7.15-r1142-dirty
@PG ID:MarkDuplicates VN:1.136(f187319bf8bbde56892d5b5a1ce3fc0529b71a49_1436805856) CL:picard.sam.markduplicates.MarkDuplicates INPUT=[/mnt/galaxy/files/000/144/dataset_144572.dat] OUTPUT=/mnt/galaxy/files/000/144/dataset_144575.dat METRICS_FILE=/mnt/galaxy/files/000/144/dataset_144574.dat REMOVE_DUPLICATES=true ASSUME_SORTED=false DUPLICATE_SCORING_STRATEGY=SUM_OF_BASE_QUALITIES READ_NAME_REGEX=[a-zA-Z0-9]+:[0-9]:([0-9]+):([0-9]+):([0-9]+).*. OPTICAL_DUPLICATE_PIXEL_DISTANCE=100 VERBOSITY=ERROR QUIET=true VALIDATION_STRINGENCY=LENIENT MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP=50000 MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=8000 SORTING_COLLECTION_SIZE_RATIO=0.25 PROGRAM_RECORD_ID=MarkDuplicates PROGRAM_GROUP_NAME=MarkDuplicates COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json PN:MarkDuplicates PP:bwa
M00123:81:000000000-AUCY8:1:1109:19271:19160 163 chrM 707 60 150M = 723 166 CCCCATTCCAGTGAGTTCACCCTCTAAATCACCACGATCAAAAGGGACAAGCATCAAGCACGCAGCAATGCAGCTCAAAACGCTTAGCCTAGCCACACCCCCACGGGAAACAGCAGTGATTAACCTTTAGCAATAAACGAAAGTTTAACT CCDCDFFFFFFFGGGGGGGGGGHHHHHHHHHHHHHGGHHGHHGHGGHGGGHHHHHHHGHHHFGGGGHHHHHHHHHHHHHHHGGGGGHHHHHHHHHHHGGGGGGGGGGGGHHHHHHHHHBHHHHHHHHHHHHHHHHHHHGGGGGGHHHHHH MC:Z:150M MD:Z:4G145 PG:Z:MarkDuplicates RG:Z:ID1 NM:i:1 MQ:i:60 UQ:i:35 AS:i:145