Question: GFF format not working with tophat
gravatar for sarde279
16 months ago by
sarde2790 wrote:

I am working with pepper genome Capsicum annuum zunla variety ( I was aligning the RNA-seq reads to this genome with the following command,

tophat2 -p 6 --min-intron-length 40 --bowtie-n --no-novel-juncs --b2-very-sensitive -G /Volumes/Pepper_Tosh/TOPHAT/CaZL1.gff BOWTIE2_INDEXES/capsicumgenome2 PT_8h_5_S30_combined_R1_001_trimmed.fastq

But, somehow I am getting the following error, [2017-08-03 15:37:14] Beginning TopHat run (v2.0.14)

[2017-08-03 15:37:14] Checking for Bowtie

Bowtie version:

[2017-08-03 15:37:14] Checking for Bowtie index files (genome)..

[2017-08-03 15:37:14] Checking for reference FASTA file

[2017-08-03 15:37:14] Generating SAM header for BOWTIE2_INDEXES/capsicumgenome2

[2017-08-03 15:37:19] Reading known junctions from GTF file

[2017-08-03 15:37:20] Preparing reads

left reads: min. length=12, max. length=111, 19832519 kept reads (5687 discarded)

Warning: short reads (<20bp) will make TopHat quite slow and take large amount of memory because they are likely to be mapped in too many places

[2017-08-03 15:45:29] Building transcriptome data files ./tophat_out/tmp/CaZL1

[2017-08-03 15:46:02] Building Bowtie index from CaZL1.fa


Error: Couldn't build bowtie index with err = 1

I think the gff file of my genome is not optimum here to run. The gff file I used is downloaded from the below link, and the name of file is - Capsicum.annuum.L_Zunla-1_v2.0_genes.gff.gz

Can please someone help me to figure out what is exactly going wrong??

Thank you very much in advance.

gff tophat bowtie rna-seq • 826 views
ADD COMMENTlink modified 15 months ago by Jennifer Hillman Jackson25k • written 16 months ago by sarde2790
gravatar for Jennifer Hillman Jackson
15 months ago by
United States
Jennifer Hillman Jackson25k wrote:


The error is at the custom genome indexing step. Verify/adjust the format of the custom genome used for mapping. While doing that, also check for a reference genome mismatch problem (after reformatting the CG) between the GFF and CG chromosome identifiers. All inputs much be a match. And finally, sorting BAMs and sometimes the other inputs (GTF, etc) can help avoid resource problems, so try sorting if you run into more problems after fixing the above.

How-to is in these FAQs:

Also, Tophat has been deprecated. Please try using HISAT or RNASTAR instead.

Tutorials that include updated RNA-seq protocols:

Thanks, Jen, Galaxy team

ADD COMMENTlink written 15 months ago by Jennifer Hillman Jackson25k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 175 users visited in the last hour