Question: GFF format not working with tophat
13 days ago by
sarde2790 wrote:

I am working with pepper genome Capsicum annuum zunla variety ( I was aligning the RNA-seq reads to this genome with the following command,

tophat2 -p 6 --min-intron-length 40 --bowtie-n --no-novel-juncs --b2-very-sensitive -G /Volumes/Pepper_Tosh/TOPHAT/CaZL1.gff BOWTIE2_INDEXES/capsicumgenome2 PT_8h_5_S30_combined_R1_001_trimmed.fastq

But, somehow I am getting the following error, [2017-08-03 15:37:14] Beginning TopHat run (v2.0.14)

[2017-08-03 15:37:14] Checking for Bowtie

Bowtie version:

[2017-08-03 15:37:14] Checking for Bowtie index files (genome)..

[2017-08-03 15:37:14] Checking for reference FASTA file

[2017-08-03 15:37:14] Generating SAM header for BOWTIE2_INDEXES/capsicumgenome2

[2017-08-03 15:37:19] Reading known junctions from GTF file

[2017-08-03 15:37:20] Preparing reads

left reads: min. length=12, max. length=111, 19832519 kept reads (5687 discarded)

Warning: short reads (<20bp) will make TopHat quite slow and take large amount of memory because they are likely to be mapped in too many places

[2017-08-03 15:45:29] Building transcriptome data files ./tophat_out/tmp/CaZL1

[2017-08-03 15:46:02] Building Bowtie index from CaZL1.fa


Error: Couldn't build bowtie index with err = 1

I think the gff file of my genome is not optimum here to run. The gff file I used is downloaded from the below link, and the name of file is - Capsicum.annuum.L_Zunla-1_v2.0_genes.gff.gz

Can please someone help me to figure out what is exactly going wrong??

Thank you very much in advance.

gff tophat bowtie rna-seq • 57 views
• written 13 days ago by sarde2790
9 days ago by
Jennifer Hillman Jackson22k wrote:


The error is at the custom genome indexing step. Verify/adjust the format of the custom genome used for mapping. While doing that, also check for a reference genome mismatch problem (after reformatting the CG) between the GFF and CG chromosome identifiers. All inputs much be a match. And finally, sorting BAMs and sometimes the other inputs (GTF, etc) can help avoid resource problems, so try sorting if you run into more problems after fixing the above.

How-to is in these FAQs:

Also, Tophat has been deprecated. Please try using HISAT or RNASTAR instead.

Tutorials that include updated RNA-seq protocols:

Thanks, Jen, Galaxy team

written 9 days ago by Jennifer Hillman Jackson22k
