I want to perform a DEXseq analysis on alternative splicing, and it requires to map the RNA-seq data to a reference genome (zebrafish genome in my case) first. I want to use galaxy to do the tophat2 mapping using the zebrafish genome downloaded from ftp://ftp.ensembl.org/pub/release-75/fasta/danio_rerio/dna/ . There are about 80 small files in the ensembl folder. I downloaded them and catenated the files in Linux. I uploaded it to galaxy as a fasta file and used tophat2 for mapping. However, an error occur which says:
Warning: Encountered reference sequence with only gaps Error: Reference sequence has more than 2^32-1 characters! Please divide the reference into batches or chunks of about 3.6 billion characters or less each and index each independently. Error: Encountered internal Bowtie 2 exception (#1) Command: bowtie2-build /galaxy/run/prod/database/files/097/dataset_97644.dat genome Deleting "genome.3.bt2" file written during aborted indexing attempt. Deleting "genome.4.bt2" file written during aborted indexing attempt. [2014-06-05 10:28:33] Beginning TopHat run (v2.0.1) ----------------------------------------------- [2014-06-05 10:28:33] Checking for Bowtie Traceback (most recent call last): File "/apps/tuxedo/tophat/2.0.1/bin/tophat2", line 3901, in <module> sys.exit(main()) File "/apps/tuxedo/tophat/2.0.1/bin/tophat2", line 3706, in main check_bowtie(params) File "/apps/tuxedo/tophat/2.0.1/bin/tophat2", line 1381, in check_bowtie bowtie_version = get_bowtie_version(params.bowtie2) File "/apps/tuxedo/tophat/2.0.1/bin/tophat2", line 1264, in get_bowtie_version bowtie_version = [int(x) for x in ver_numbers[:3]] + [int(ver_numbers[3][4:])] IndexError: list index out of range
What can I do? I prefer to use ensembl genome assembly because I need to use ensembl transcriptome for annotation later. Thank you and I look forward to your answers!!