Question

Question: TopHat: Error: Couldn't build bowtie index with err = 1

0

3.4 years ago by

United States

madkisson • 30 wrote:

Hi

I'm trying to run TopHat on a Cloudman Galaxy instance and I keep getting the below error. I've run TopHat on this FastQ file several times before with great results. the ONLY thing that has changed is that I'm trying to use a Mouse mm10 FastA file [one that has been used elsewhere with success] for the genome alignment rather than using the built-in Galaxy Mouse mm10 genome.

Here's the error report:

Fatal error: Tool execution failed Building a SMALL index

[2015-07-24 17:25:11] Beginning TopHat run (v2.0.14)
-----------------------------------------------
[2015-07-24 17:25:11] Checking for Bowtie
    Bowtie version:  2.2.5.0
[2015-07-24 17:25:12] Checking for Bowtie index files (genome)..
[2015-07-24 17:25:12] Checking for reference FASTA file
[2015-07-24 17:25:12] Generating SAM header for genome
[2015-07-24 17:26:36] Reading known junctions from GTF file
[2015-07-24 17:26:56] Preparing reads
  left reads: min. length=50, max. length=50, 40744020 kept reads (126243 discarded)
[2015-07-24 17:36:05] Building transcriptome data files ./tophat_out/tmp/dataset_5000
[2015-07-24 17:36:27] Building Bowtie index from dataset_5000.fa
[FAILED]
Error: Couldn't build bowtie index with err = 1
[bam_header_read] bgzf_check_EOF: Invalid argument

The tool produced the following additional output:

Settings:
  Output files: "genome.*.bt2"
  Line rate: 6 (line is 64 bytes)
  Lines per side: 1 (side is 64 bytes)
  Offset rate: 4 (one in 16)
  FTable chars: 10
  Strings: unpacked
  Max bucket size: default
  Max bucket size, sqrt multiplier: default
  Max bucket size, len divisor: 4
  Difference-cover sample period: 1024
  Endianness: little
  Actual local endianness: little
  Sanity checking: disabled
  Assertions: disabled
  Random seed: 0
  Sizeofs: void*:8, int:4, long:8, size_t:8
Input files DNA, FASTA:
  /mnt/galaxy/files/005/dataset_5100.dat
Reading reference sizes
  Time reading reference sizes: 00:01:22
Calculating joined length
Writing header
Reserving space for joined string
Joining reference sequences
  Time to join reference sequences: 00:00:29
bmax according to bmaxDivN setting: 663195875
Using parameters --bmax 497396907 --dcv 1024
  Doing ahead-of-time memory usage test
  Passed!  Constructing with these parameters: --bmax 497396907 --dcv 1024
Constructing suffix-array element generator
Building DifferenceCoverSample
  Building sPrime
  Building sPrimeOrder
  V-Sorting samples
  V-Sorting samples time: 00:01:54
  Allocating rank array
  Ranking v-sort output
  Ranking v-sort output time: 00:00:29
  Invoking Larsson-Sadakane on ranks
  Invoking Larsson-Sadakane on ranks time: 00:00:54
  Sanity-checking and returning
Building samples
Reserving space for 12 sample suffixes
Generating random suffixes
QSorting 12 sample offsets, eliminating duplicates
QSorting sample offsets, eliminating duplicates time: 00:00:00
Multikey QSorting 12 samples
  (Using difference cover)
  Multikey QSorting samples time: 00:00:00
Calculating bucket sizes
  Binary sorting into buckets
  10%
  20%
  30%
  40%
  50%
  60%
  70%
  80%
  90%
  100%
  Binary sorting into buckets time: 00:02:00
Splitting and merging
  Splitting and merging time: 00:00:00
Split 1, merged 6; iterating...
  Binary sorting into buckets
  10%
  20%
  30%
  40%
  50%
  60%
  70%
  80%
  90%
  100%
  Binary sorting into buckets time: 00:01:43
Splitting and merging
  Splitting and merging time: 00:00:00
Avg bucket size: 3.78969e+08 (target: 497396906)
Converting suffix-array elements to index image
Allocating ftab, absorbFtab
Entering Ebwt loop
Getting block 1 of 7
  Reserving size (497396907) for bucket
  Calculating Z arrays
  Calculating Z arrays time: 00:00:00
  Entering block accumulator loop:
  10%
  20%
  30%
  40%
  50%
  60%
  70%
  80%
  90%
  100%
  Block accumulator loop time: 00:00:37
  Sorting block of length 470172614
  (Using difference cover)
  Sorting block time: 00:09:28
Returning block of 470172615
Getting block 2 of 7
  Reserving size (497396907) for bucket
  Calculating Z arrays
  Calculating Z arrays time: 00:00:00
  Entering block accumulator loop:
  10%
  20%
  30%
  40%
  50%
  60%
  70%
  80%
  90%
  100%
  Block accumulator loop time: 00:00:44
  Sorting block of length 392875285
  (Using difference cover)
  Sorting block time: 00:07:56
Returning block of 392875286
Getting block 3 of 7
  Reserving size (497396907) for bucket
  Calculating Z arrays
  Calculating Z arrays time: 00:00:00
  Entering block accumulator loop:
  10%
  20%
  30%
  40%
  50%
  60%
  70%
  80%
  90%
  100%
  Block accumulator loop time: 00:00:43
  Sorting block of length 287831635
  (Using difference cover)
  Sorting block time: 00:05:43
Returning block of 287831636
Getting block 4 of 7
  Reserving size (497396907) for bucket
  Calculating Z arrays
  Calculating Z arrays time: 00:00:00
  Entering block accumulator loop:
  10%
  20%
  30%
  40%
  50%
  60%
  70%
  80%
  90%
  100%
  Block accumulator loop time: 00:00:52
  Sorting block of length 267502683
  (Using difference cover)
  Sorting block time: 00:05:19
Returning block of 267502684
Getting block 5 of 7
  Reserving size (497396907) for bucket
  Calculating Z arrays
  Calculating Z arrays time: 00:00:00
  Entering block accumulator loop:
  10%
  20%
  30%
  40%
  50%
  60%
  70%
  80%
  90%
  100%
  Block accumulator loop time: 00:00:48
  Sorting block of length 429791782
  (Using difference cover)
  Sorting block time: 00:08:39
Returning block of 429791783
Getting block 6 of 7
  Reserving size (497396907) for bucket
  Calculating Z arrays
  Calculating Z arrays time: 00:00:00
  Entering block accumulator loop:
  10%
  20%
  30%
  40%
  50%
  60%
  70%
  80%
  90%
  100%
  Block accumulator loop time: 00:00:51
  Sorting block of length 482842074
  (Using difference cover)
  Sorting block time: 00:09:57
Returning block of 482842075
Getting block 7 of 7
  Reserving size (497396907) for bucket
  Calculating Z arrays
  Calculating Z arrays time: 00:00:00
  Entering block accumulator loop:
  10%
  20%
  30%
  40%
  50%
  60%
  70%
  80%
  90%
  100%
  Block accumulator loop time: 00:00:36
  Sorting block of length 321767421
  (Using difference cover)
  Sorting block time: 00:06:37
Returning block of 321767422
Exited Ebwt loop
fchr[A]: 0
fchr[C]: 773280124
fchr[G]: 1325927941
fchr[T]: 1878618059
fchr[$]: 2652783500
Exiting Ebwt::buildToDisk()
Returning from initFromVector
Wrote 888467894 bytes to primary EBWT file: genome.1.bt2
Wrote 663195880 bytes to secondary EBWT file: genome.2.bt2
Re-opening _in1 and _in2 as input streams
Returning from Ebwt constructor
Headers:
    len: 2652783500
    bwtLen: 2652783501
    sz: 663195875
    bwtSz: 663195876
    lineRate: 6
    offRate: 4
    offMask: 0xfffffff0
    ftabChars: 10
    eftabLen: 20
    eftabSz: 80
    ftabLen: 1048577
    ftabSz: 4194308
    offsLen: 165798969
    offsSz: 663195876
    lineSz: 64
    sideSz: 64
    sideBwtSz: 48
    sideBwtLen: 192
    numSides: 13816581
    numLines: 13816581
    ebwtTotLen: 884261184
    ebwtTotSz: 884261184
    color: 0
    reverse: 0
Total time for call to driver() for forward index: 01:18:04
Reading reference sizes
  Time reading reference sizes: 00:00:27
Calculating joined length
Writing header
Reserving space for joined string
Joining reference sequences
  Time to join reference sequences: 00:00:29
  Time to reverse reference sequence: 00:00:04
bmax according to bmaxDivN setting: 663195875
Using parameters --bmax 497396907 --dcv 1024
  Doing ahead-of-time memory usage test
  Passed!  Constructing with these parameters: --bmax 497396907 --dcv 1024
Constructing suffix-array element generator
Building DifferenceCoverSample
  Building sPrime
  Building sPrimeOrder
  V-Sorting samples
  V-Sorting samples time: 00:01:55
  Allocating rank array
  Ranking v-sort output
  Ranking v-sort output time: 00:00:29
  Invoking Larsson-Sadakane on ranks
  Invoking Larsson-Sadakane on ranks time: 00:00:53
  Sanity-checking and returning
Building samples
Reserving space for 12 sample suffixes
Generating random suffixes
QSorting 12 sample offsets, eliminating duplicates
QSorting sample offsets, eliminating duplicates time: 00:00:00
Multikey QSorting 12 samples
  (Using difference cover)
  Multikey QSorting samples time: 00:00:00
Calculating bucket sizes
  Binary sorting into buckets
  10%
  20%
  30%
  40%
  50%
  60%
  70%
  80%
  90%
  100%
  Binary sorting into buckets time: 00:01:55
Splitting and merging
  Splitting and merging time: 00:00:00
Split 1, merged 6; iterating...
  Binary sorting into buckets
  10%
  20%
  30%
  40%
  50%
  60%
  70%
  80%
  90%
  100%
  Binary sorting into buckets time: 00:01:43
Splitting and merging
  Splitting and merging time: 00:00:00
Avg bucket size: 3.31598e+08 (target: 497396906)
Converting suffix-array elements to index image
Allocating ftab, absorbFtab
Entering Ebwt loop
Getting block 1 of 8
  Reserving size (497396907) for bucket
  Calculating Z arrays
  Calculating Z arrays time: 00:00:00
  Entering block accumulator loop:
  10%
  20%
  30%
  40%
  50%
  60%
  70%
  80%
  90%
  100%
  Block accumulator loop time: 00:00:35
  Sorting block of length 481244811
  (Using difference cover)
  Sorting block time: 00:09:55
Returning block of 481244812
Getting block 2 of 8
  Reserving size (497396907) for bucket
  Calculating Z arrays
  Calculating Z arrays time: 00:00:00
  Entering block accumulator loop:
  10%
  20%
  30%
  40%
  50%
  60%
  70%
  80%
  90%
  100%
  Block accumulator loop time: 00:00:43
  Sorting block of length 401432159
  (Using difference cover)
  Sorting block time: 00:08:10
Returning block of 401432160
Getting block 3 of 8
  Reserving size (497396907) for bucket
  Calculating Z arrays
  Calculating Z arrays time: 00:00:00
  Entering block accumulator loop:
  10%
  20%
  30%
  40%
  50%
  60%
  70%
  80%
  90%
  100%
  Block accumulator loop time: 00:00:45
  Sorting block of length 299524672
  (Using difference cover)
  Sorting block time: 00:06:00
Returning block of 299524673
Getting block 4 of 8
  Reserving size (497396907) for bucket
  Calculating Z arrays
  Calculating Z arrays time: 00:00:00
  Entering block accumulator loop:
  10%
  20%
  30%
  40%
  50%
  60%
  70%
  80%
  90%
  100%
  Block accumulator loop time: 00:00:50
  Sorting block of length 371622119
  (Using difference cover)
  Sorting block time: 00:07:31
Returning block of 371622120
Getting block 5 of 8
  Reserving size (497396907) for bucket
  Calculating Z arrays
  Calculating Z arrays time: 00:00:00
  Entering block accumulator loop:
  10%
  20%
  30%
  40%
  50%
  60%
  70%
  80%
  90%
  100%
  Block accumulator loop time: 00:00:46
  Sorting block of length 192869528
  (Using difference cover)
  Sorting block time: 00:03:46
Returning block of 192869529
Getting block 6 of 8
  Reserving size (497396907) for bucket
  Calculating Z arrays
  Calculating Z arrays time: 00:00:00
  Entering block accumulator loop:
  10%
  20%
  30%
  40%
  50%
  60%
  70%
  80%
  90%
  100%
  Block accumulator loop time: 00:00:51
  Sorting block of length 363313281
  (Using difference cover)
  Sorting block time: 00:07:26
Returning block of 363313282
Getting block 7 of 8
  Reserving size (497396907) for bucket
  Calculating Z arrays
  Calculating Z arrays time: 00:00:00
  Entering block accumulator loop:
  10%
  20%
  30%
  40%
  50%
  60%
  70%
  80%
  90%
  100%
  Block accumulator loop time: 00:00:47
  Sorting block of length 286950766
  (Using difference cover)
  Sorting block time: 00:05:43
Returning block of 286950767
Getting block 8 of 8
  Reserving size (497396907) for bucket
  Calculating Z arrays
  Calculating Z arrays time: 00:00:00
  Entering block accumulator loop:
  10%
  20%
  30%
  40%
  50%
  60%
  70%
  80%
  90%
  100%
  Block accumulator loop time: 00:00:33
  Sorting block of length 255826157
  (Using difference cover)
  Sorting block time: 00:05:11
Returning block of 255826158
Exited Ebwt loop
fchr[A]: 0
fchr[C]: 773280124
fchr[G]: 1325927941
fchr[T]: 1878618059
fchr[$]: 2652783500
Exiting Ebwt::buildToDisk()
Returning from initFromVector
Wrote 888467894 bytes to primary EBWT file: genome.rev.1.bt2
Wrote 663195880 bytes to secondary EBWT file: genome.rev.2.bt2
Re-opening _in1 and _in2 as input streams
Returning from Ebwt constructor
Headers:
    len: 2652783500
    bwtLen: 2652783501
    sz: 663195875
    bwtSz: 663195876
    lineRate: 6
    offRate: 4
    offMask: 0xfffffff0
    ftabChars: 10
    eftabLen: 20
    eftabSz: 80
    ftabLen: 1048577
    ftabSz: 4194308
    offsLen: 165798969
    offsSz: 663195876
    lineSz: 64
    sideSz: 64
    sideBwtSz: 48
    sideBwtLen: 192
    numSides: 13816581
    numLines: 13816581
    ebwtTotLen: 884261184
    ebwtTotSz: 884261184
    color: 0
    reverse: 1
Total time for backward call to driver() for mirror index: 01:17:50
[bam_header_read] EOF marker is absent. The input is probably truncated.
[bam_header_read] invalid BAM binary header (this is not a BAM file).
[bam_index_core] Invalid BAM header.[bam_index_build2] fail to index the BAM file.

Thank you

tophat rnaseq fasta • 3.1k views

ADD COMMENT • link •

modified 3.3 years ago by Jennifer Hillman Jackson ♦ 25k • written 3.4 years ago by madkisson • 30

[2015-07-29 00:02:10] Beginning TopHat run (v2.0.14)

Similar posts • Search »