I want to perform a DEXseq analysis on alternative splicing, and it requires to map the RNA-seq data to a reference genome (zebrafish genome in my case) first. I want to use galaxy to do the tophat2 mapping using the zebrafish genome downloaded from ftp://ftp.ensembl.org/pub/release-75/fasta/danio_rerio/dna/ . There are about 80 small files in the ensembl folder. I downloaded them and catenated the files in Linux. I uploaded it to galaxy as a fasta file and used tophat2 for mapping. However, an error occur which says:
Warning: Encountered reference sequence with only gaps
Error: Reference sequence has more than 2^32-1 characters! Please divide the
reference into batches or chunks of about 3.6 billion characters or less each
and index each independently.
Error: Encountered internal Bowtie 2 exception (#1)
Command: bowtie2-build /galaxy/run/prod/database/files/097/dataset_97644.dat genome
Deleting "genome.3.bt2" file written during aborted indexing attempt.
Deleting "genome.4.bt2" file written during aborted indexing attempt.
[2014-06-05 10:28:33] Beginning TopHat run (v2.0.1)
-----------------------------------------------
[2014-06-05 10:28:33] Checking for Bowtie
Traceback (most recent call last):
File "/apps/tuxedo/tophat/2.0.1/bin/tophat2", line 3901, in <module>
sys.exit(main())
File "/apps/tuxedo/tophat/2.0.1/bin/tophat2", line 3706, in main
check_bowtie(params)
File "/apps/tuxedo/tophat/2.0.1/bin/tophat2", line 1381, in check_bowtie
bowtie_version = get_bowtie_version(params.bowtie2)
File "/apps/tuxedo/tophat/2.0.1/bin/tophat2", line 1264, in get_bowtie_version
bowtie_version = [int(x) for x in ver_numbers[:3]] + [int(ver_numbers[3][4:])]
IndexError: list index out of range
What can I do? I prefer to use ensembl genome assembly because I need to use ensembl transcriptome for annotation later. Thank you and I look forward to your answers!!
