Question: TopHat: Error: Couldn't build bowtie index with err = 1
0
gravatar for madkisson
3.4 years ago by
madkisson30
United States
madkisson30 wrote:

Hi

I'm trying to run TopHat on a Cloudman Galaxy instance and I keep getting the below error. I've run TopHat on this FastQ file several times before with great results. the ONLY thing that has changed is that I'm trying to use a Mouse mm10 FastA file [one that has been used elsewhere with success] for the genome alignment rather than using the built-in Galaxy Mouse mm10 genome.

Here's the error report:

Fatal error: Tool execution failed
Building a SMALL index

[2015-07-24 17:25:11] Beginning TopHat run (v2.0.14)
-----------------------------------------------
[2015-07-24 17:25:11] Checking for Bowtie
    Bowtie version:  2.2.5.0
[2015-07-24 17:25:12] Checking for Bowtie index files (genome)..
[2015-07-24 17:25:12] Checking for reference FASTA file
[2015-07-24 17:25:12] Generating SAM header for genome
[2015-07-24 17:26:36] Reading known junctions from GTF file
[2015-07-24 17:26:56] Preparing reads
  left reads: min. length=50, max. length=50, 40744020 kept reads (126243 discarded)
[2015-07-24 17:36:05] Building transcriptome data files ./tophat_out/tmp/dataset_5000
[2015-07-24 17:36:27] Building Bowtie index from dataset_5000.fa
[FAILED]
Error: Couldn't build bowtie index with err = 1
[bam_header_read] bgzf_check_EOF: Invalid argument

The tool produced the following additional output:

Settings:
  Output files: "genome.*.bt2"
  Line rate: 6 (line is 64 bytes)
  Lines per side: 1 (side is 64 bytes)
  Offset rate: 4 (one in 16)
  FTable chars: 10
  Strings: unpacked
  Max bucket size: default
  Max bucket size, sqrt multiplier: default
  Max bucket size, len divisor: 4
  Difference-cover sample period: 1024
  Endianness: little
  Actual local endianness: little
  Sanity checking: disabled
  Assertions: disabled
  Random seed: 0
  Sizeofs: void*:8, int:4, long:8, size_t:8
Input files DNA, FASTA:
  /mnt/galaxy/files/005/dataset_5100.dat
Reading reference sizes
  Time reading reference sizes: 00:01:22
Calculating joined length
Writing header
Reserving space for joined string
Joining reference sequences
  Time to join reference sequences: 00:00:29
bmax according to bmaxDivN setting: 663195875
Using parameters --bmax 497396907 --dcv 1024
  Doing ahead-of-time memory usage test
  Passed!  Constructing with these parameters: --bmax 497396907 --dcv 1024
Constructing suffix-array element generator
Building DifferenceCoverSample
  Building sPrime
  Building sPrimeOrder
  V-Sorting samples
  V-Sorting samples time: 00:01:54
  Allocating rank array
  Ranking v-sort output
  Ranking v-sort output time: 00:00:29
  Invoking Larsson-Sadakane on ranks
  Invoking Larsson-Sadakane on ranks time: 00:00:54
  Sanity-checking and returning
Building samples
Reserving space for 12 sample suffixes
Generating random suffixes
QSorting 12 sample offsets, eliminating duplicates
QSorting sample offsets, eliminating duplicates time: 00:00:00
Multikey QSorting 12 samples
  (Using difference cover)
  Multikey QSorting samples time: 00:00:00
Calculating bucket sizes
  Binary sorting into buckets
  10%
  20%
  30%
  40%
  50%
  60%
  70%
  80%
  90%
  100%
  Binary sorting into buckets time: 00:02:00
Splitting and merging
  Splitting and merging time: 00:00:00
Split 1, merged 6; iterating...
  Binary sorting into buckets
  10%
  20%
  30%
  40%
  50%
  60%
  70%
  80%
  90%
  100%
  Binary sorting into buckets time: 00:01:43
Splitting and merging
  Splitting and merging time: 00:00:00
Avg bucket size: 3.78969e+08 (target: 497396906)
Converting suffix-array elements to index image
Allocating ftab, absorbFtab
Entering Ebwt loop
Getting block 1 of 7
  Reserving size (497396907) for bucket
  Calculating Z arrays
  Calculating Z arrays time: 00:00:00
  Entering block accumulator loop:
  10%
  20%
  30%
  40%
  50%
  60%
  70%
  80%
  90%
  100%
  Block accumulator loop time: 00:00:37
  Sorting block of length 470172614
  (Using difference cover)
  Sorting block time: 00:09:28
Returning block of 470172615
Getting block 2 of 7
  Reserving size (497396907) for bucket
  Calculating Z arrays
  Calculating Z arrays time: 00:00:00
  Entering block accumulator loop:
  10%
  20%
  30%
  40%
  50%
  60%
  70%
  80%
  90%
  100%
  Block accumulator loop time: 00:00:44
  Sorting block of length 392875285
  (Using difference cover)
  Sorting block time: 00:07:56
Returning block of 392875286
Getting block 3 of 7
  Reserving size (497396907) for bucket
  Calculating Z arrays
  Calculating Z arrays time: 00:00:00
  Entering block accumulator loop:
  10%
  20%
  30%
  40%
  50%
  60%
  70%
  80%
  90%
  100%
  Block accumulator loop time: 00:00:43
  Sorting block of length 287831635
  (Using difference cover)
  Sorting block time: 00:05:43
Returning block of 287831636
Getting block 4 of 7
  Reserving size (497396907) for bucket
  Calculating Z arrays
  Calculating Z arrays time: 00:00:00
  Entering block accumulator loop:
  10%
  20%
  30%
  40%
  50%
  60%
  70%
  80%
  90%
  100%
  Block accumulator loop time: 00:00:52
  Sorting block of length 267502683
  (Using difference cover)
  Sorting block time: 00:05:19
Returning block of 267502684
Getting block 5 of 7
  Reserving size (497396907) for bucket
  Calculating Z arrays
  Calculating Z arrays time: 00:00:00
  Entering block accumulator loop:
  10%
  20%
  30%
  40%
  50%
  60%
  70%
  80%
  90%
  100%
  Block accumulator loop time: 00:00:48
  Sorting block of length 429791782
  (Using difference cover)
  Sorting block time: 00:08:39
Returning block of 429791783
Getting block 6 of 7
  Reserving size (497396907) for bucket
  Calculating Z arrays
  Calculating Z arrays time: 00:00:00
  Entering block accumulator loop:
  10%
  20%
  30%
  40%
  50%
  60%
  70%
  80%
  90%
  100%
  Block accumulator loop time: 00:00:51
  Sorting block of length 482842074
  (Using difference cover)
  Sorting block time: 00:09:57
Returning block of 482842075
Getting block 7 of 7
  Reserving size (497396907) for bucket
  Calculating Z arrays
  Calculating Z arrays time: 00:00:00
  Entering block accumulator loop:
  10%
  20%
  30%
  40%
  50%
  60%
  70%
  80%
  90%
  100%
  Block accumulator loop time: 00:00:36
  Sorting block of length 321767421
  (Using difference cover)
  Sorting block time: 00:06:37
Returning block of 321767422
Exited Ebwt loop
fchr[A]: 0
fchr[C]: 773280124
fchr[G]: 1325927941
fchr[T]: 1878618059
fchr[$]: 2652783500
Exiting Ebwt::buildToDisk()
Returning from initFromVector
Wrote 888467894 bytes to primary EBWT file: genome.1.bt2
Wrote 663195880 bytes to secondary EBWT file: genome.2.bt2
Re-opening _in1 and _in2 as input streams
Returning from Ebwt constructor
Headers:
    len: 2652783500
    bwtLen: 2652783501
    sz: 663195875
    bwtSz: 663195876
    lineRate: 6
    offRate: 4
    offMask: 0xfffffff0
    ftabChars: 10
    eftabLen: 20
    eftabSz: 80
    ftabLen: 1048577
    ftabSz: 4194308
    offsLen: 165798969
    offsSz: 663195876
    lineSz: 64
    sideSz: 64
    sideBwtSz: 48
    sideBwtLen: 192
    numSides: 13816581
    numLines: 13816581
    ebwtTotLen: 884261184
    ebwtTotSz: 884261184
    color: 0
    reverse: 0
Total time for call to driver() for forward index: 01:18:04
Reading reference sizes
  Time reading reference sizes: 00:00:27
Calculating joined length
Writing header
Reserving space for joined string
Joining reference sequences
  Time to join reference sequences: 00:00:29
  Time to reverse reference sequence: 00:00:04
bmax according to bmaxDivN setting: 663195875
Using parameters --bmax 497396907 --dcv 1024
  Doing ahead-of-time memory usage test
  Passed!  Constructing with these parameters: --bmax 497396907 --dcv 1024
Constructing suffix-array element generator
Building DifferenceCoverSample
  Building sPrime
  Building sPrimeOrder
  V-Sorting samples
  V-Sorting samples time: 00:01:55
  Allocating rank array
  Ranking v-sort output
  Ranking v-sort output time: 00:00:29
  Invoking Larsson-Sadakane on ranks
  Invoking Larsson-Sadakane on ranks time: 00:00:53
  Sanity-checking and returning
Building samples
Reserving space for 12 sample suffixes
Generating random suffixes
QSorting 12 sample offsets, eliminating duplicates
QSorting sample offsets, eliminating duplicates time: 00:00:00
Multikey QSorting 12 samples
  (Using difference cover)
  Multikey QSorting samples time: 00:00:00
Calculating bucket sizes
  Binary sorting into buckets
  10%
  20%
  30%
  40%
  50%
  60%
  70%
  80%
  90%
  100%
  Binary sorting into buckets time: 00:01:55
Splitting and merging
  Splitting and merging time: 00:00:00
Split 1, merged 6; iterating...
  Binary sorting into buckets
  10%
  20%
  30%
  40%
  50%
  60%
  70%
  80%
  90%
  100%
  Binary sorting into buckets time: 00:01:43
Splitting and merging
  Splitting and merging time: 00:00:00
Avg bucket size: 3.31598e+08 (target: 497396906)
Converting suffix-array elements to index image
Allocating ftab, absorbFtab
Entering Ebwt loop
Getting block 1 of 8
  Reserving size (497396907) for bucket
  Calculating Z arrays
  Calculating Z arrays time: 00:00:00
  Entering block accumulator loop:
  10%
  20%
  30%
  40%
  50%
  60%
  70%
  80%
  90%
  100%
  Block accumulator loop time: 00:00:35
  Sorting block of length 481244811
  (Using difference cover)
  Sorting block time: 00:09:55
Returning block of 481244812
Getting block 2 of 8
  Reserving size (497396907) for bucket
  Calculating Z arrays
  Calculating Z arrays time: 00:00:00
  Entering block accumulator loop:
  10%
  20%
  30%
  40%
  50%
  60%
  70%
  80%
  90%
  100%
  Block accumulator loop time: 00:00:43
  Sorting block of length 401432159
  (Using difference cover)
  Sorting block time: 00:08:10
Returning block of 401432160
Getting block 3 of 8
  Reserving size (497396907) for bucket
  Calculating Z arrays
  Calculating Z arrays time: 00:00:00
  Entering block accumulator loop:
  10%
  20%
  30%
  40%
  50%
  60%
  70%
  80%
  90%
  100%
  Block accumulator loop time: 00:00:45
  Sorting block of length 299524672
  (Using difference cover)
  Sorting block time: 00:06:00
Returning block of 299524673
Getting block 4 of 8
  Reserving size (497396907) for bucket
  Calculating Z arrays
  Calculating Z arrays time: 00:00:00
  Entering block accumulator loop:
  10%
  20%
  30%
  40%
  50%
  60%
  70%
  80%
  90%
  100%
  Block accumulator loop time: 00:00:50
  Sorting block of length 371622119
  (Using difference cover)
  Sorting block time: 00:07:31
Returning block of 371622120
Getting block 5 of 8
  Reserving size (497396907) for bucket
  Calculating Z arrays
  Calculating Z arrays time: 00:00:00
  Entering block accumulator loop:
  10%
  20%
  30%
  40%
  50%
  60%
  70%
  80%
  90%
  100%
  Block accumulator loop time: 00:00:46
  Sorting block of length 192869528
  (Using difference cover)
  Sorting block time: 00:03:46
Returning block of 192869529
Getting block 6 of 8
  Reserving size (497396907) for bucket
  Calculating Z arrays
  Calculating Z arrays time: 00:00:00
  Entering block accumulator loop:
  10%
  20%
  30%
  40%
  50%
  60%
  70%
  80%
  90%
  100%
  Block accumulator loop time: 00:00:51
  Sorting block of length 363313281
  (Using difference cover)
  Sorting block time: 00:07:26
Returning block of 363313282
Getting block 7 of 8
  Reserving size (497396907) for bucket
  Calculating Z arrays
  Calculating Z arrays time: 00:00:00
  Entering block accumulator loop:
  10%
  20%
  30%
  40%
  50%
  60%
  70%
  80%
  90%
  100%
  Block accumulator loop time: 00:00:47
  Sorting block of length 286950766
  (Using difference cover)
  Sorting block time: 00:05:43
Returning block of 286950767
Getting block 8 of 8
  Reserving size (497396907) for bucket
  Calculating Z arrays
  Calculating Z arrays time: 00:00:00
  Entering block accumulator loop:
  10%
  20%
  30%
  40%
  50%
  60%
  70%
  80%
  90%
  100%
  Block accumulator loop time: 00:00:33
  Sorting block of length 255826157
  (Using difference cover)
  Sorting block time: 00:05:11
Returning block of 255826158
Exited Ebwt loop
fchr[A]: 0
fchr[C]: 773280124
fchr[G]: 1325927941
fchr[T]: 1878618059
fchr[$]: 2652783500
Exiting Ebwt::buildToDisk()
Returning from initFromVector
Wrote 888467894 bytes to primary EBWT file: genome.rev.1.bt2
Wrote 663195880 bytes to secondary EBWT file: genome.rev.2.bt2
Re-opening _in1 and _in2 as input streams
Returning from Ebwt constructor
Headers:
    len: 2652783500
    bwtLen: 2652783501
    sz: 663195875
    bwtSz: 663195876
    lineRate: 6
    offRate: 4
    offMask: 0xfffffff0
    ftabChars: 10
    eftabLen: 20
    eftabSz: 80
    ftabLen: 1048577
    ftabSz: 4194308
    offsLen: 165798969
    offsSz: 663195876
    lineSz: 64
    sideSz: 64
    sideBwtSz: 48
    sideBwtLen: 192
    numSides: 13816581
    numLines: 13816581
    ebwtTotLen: 884261184
    ebwtTotSz: 884261184
    color: 0
    reverse: 1
Total time for backward call to driver() for mirror index: 01:17:50
[bam_header_read] EOF marker is absent. The input is probably truncated.
[bam_header_read] invalid BAM binary header (this is not a BAM file).
[bam_index_core] Invalid BAM header.[bam_index_build2] fail to index the BAM file.

Thank you

 

tophat rnaseq fasta • 3.1k views
ADD COMMENTlink modified 3.3 years ago by Jennifer Hillman Jackson25k • written 3.4 years ago by madkisson30
0
gravatar for Jennifer Hillman Jackson
3.3 years ago by
United States
Jennifer Hillman Jackson25k wrote:

Hello,

Using a larger genome as a Custom reference genome will consume significant memory resources. If the fasta file format is OK and you have done this analysis before elsewhere, then that points to a potential memory issue. (Also, I have seen this error before linked with highly fragmented genomes, and it was due to the same root cause: not enough memory resource).

Please consider either installing mm10 as a native reference genome (using Data Managers) or scaling up the memory for your Galaxy Cloudman.

Hopefully this helps! Jen, Galaxy team

ADD COMMENTlink written 3.3 years ago by Jennifer Hillman Jackson25k

Hi Jennifer

That didn't seem to work and the error looks pretty much the same. Here's what I had:

Master Node Type: c3.2xlarge additional Node Type: m2.2xlarge

I watched as things ran for a about 30 minutes. Neither of the nodes maxxed out at any time during the processing that did happen.

Here's the main error again:

Fatal error: Tool execution failed Building a SMALL index

[2015-07-29 00:02:10] Beginning TopHat run (v2.0.14)

[2015-07-29 00:02:10] Checking for Bowtie Bowtie version: 2.2.5.0 [2015-07-29 00:02:10] Checking for Bowtie index files (genome).. [2015-07-29 00:02:10] Checking for reference FASTA file [2015-07-29 00:02:10] Generating SAM header for genome [2015-07-29 00:02:13] Reading known junctions from GTF file [2015-07-29 00:02:28] Preparing reads left reads: min. length=50, max. length=50, 40744020 kept reads (126243 discarded) [2015-07-29 00:09:37] Building transcriptome data files ./tophat_out/tmp/dataset_5000 [2015-07-29 00:09:58] Building Bowtie index from dataset_5000.fa [FAILED] Error: Couldn't build bowtie index with err = 1 [bam_header_read] bgzf_check_EOF: Invalid argument

Can you think of another reason why the Bowtie index should fail?

Thanks Michael

ADD REPLYlink written 3.3 years ago by madkisson30

I should clear something up as well - I'm using this mm10 FASTA file so that I can compare my analysis to that produced by EBI using my same data. We're seeing some differences in output and we want to make sure that all parameters and input are exactly the same. I have no idea what the built in mm10 Galaxy genome is [version, updates, etc...] so I can't control for differences unless I use the FASTA file. So I really do need to get this to work at least for now. The built in genome has worked great for me in the past and i will probably continue using it once I've gotten these comparisons finished.

ADD REPLYlink written 3.3 years ago by madkisson30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 172 users visited in the last hour