Question: Running TopHat2 with a GTF file
3.6 years ago by
United States
Hi everyone,


I am having issues running TopHat 2. I have added the reference genome fasta file and corresponding gtf file from UCSC to my history in Galaxy for Mouse (mm10). When I look closer at the 2 files, they are both from Ensembl and have the same notation (chr1), however when I run TopHat 2 I get an error stating: Couldn't build bowtie index with err = 1. 

The first line of each file looks like this:


>mm10ensGene_ENSMUST00000086465 range=chr1:134199223-134235431 5 'pad=0 3' pad=0 strand=- repeatMasking=none


chr1 mm10_ensGene stop_codon 134202951 134202953 0.000000 - . gene_id "ENSMUST00000086465"; transcript_id "ENSMUST00000086465";

3.6 years ago by
United States
The reference genome (Custom?) is in Emsembl format. But, the reference annotation has UCSC chromosome identifiers - it is based on mm10 (but the track contents is from Ensembl). These two must be an exact match.

The error indicates a format error in the fasta file. Here is more about custom reference genome. Section 2.14

I would suggested getting mm10 from UCSC downloads area and indexing that your server (local?). There is a data manager in the Tool Shed to use for both genome retrieval and index creation.

Best, Jen, Galaxy team




