Hi, I am a graduate student who began RNA-seq data analysis recently.
I am having a trouble with testing RNA-seq analysis using a set of data provided from Nature Protocols (Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Trapnell et al.).
In the TopHat part, I used own juction and used annotation from Ensembl fruit fly gene set data (in a gtf format; ftp://ftp.ensembl.org/pub/release-75/gtf/drosophila_melanogaster/Drosophila_melanogaster.BDGP5.75.gtf.gz). At first it seemed running fine, but after a couple of minute an error message shows up:
Fatal error: Tool execution failed [2014-06-17 06:29:59] Beginning TopHat run (v2.0.9)
[2014-06-17 06:29:59] Checking for Bowtie Bowtie version: 22.214.171.124
[2014-06-17 06:29:59] Checking for Samtools Samtools version: 0.1.18.0
[2014-06-17 06:29:59] Checking for Bowtie index files (genome)..
[2014-06-17 06:29:59] Checking for reference FASTA file
[2014-06-17 06:29:59] Generating SAM header for /galaxy/data/dm3/bowtie2_index/dm3 format: fastq quality scale: phred33 (default)
[2014-06-17 06:30:01] Reading known junctions from GTF file
[2014-06-17 06:30:05] Preparing reads left reads: min. length=75, max. length=75, 11607353 kept reads (0 discarded) right reads: min. length=75, max. length=75, 11607353 kept reads (0 discarded)
[2014-06-17 06:32:35] Building transcriptome data files..
[2014-06-17 06:32:39] Building Bowtie index from dataset_8400771.fa
[FAILED] Error: Couldn't build bowtie index with err = 1
Trying to find a solution to this problem, I came across to a comment:
The gtf and the reference fasta files identifiers must be the same. Consider to update the chromosome/contig names in all your annotation files
(gtf, gff, dbsnp vcf, etc)
I did not understand exactly, but somehow there is a mismatch between the gene data set and reference in Galaxy. Could you help me with this matter? Has this error caused by uploading wrong gene data set?