Question: GFF format not working with tophat
gravatar for sarde279
13 days ago by
sarde2790 wrote:

I am working with pepper genome Capsicum annuum zunla variety ( I was aligning the RNA-seq reads to this genome with the following command,

tophat2 -p 6 --min-intron-length 40 --bowtie-n --no-novel-juncs --b2-very-sensitive -G /Volumes/Pepper_Tosh/TOPHAT/CaZL1.gff BOWTIE2_INDEXES/capsicumgenome2 PT_8h_5_S30_combined_R1_001_trimmed.fastq

But, somehow I am getting the following error, [2017-08-03 15:37:14] Beginning TopHat run (v2.0.14)

[2017-08-03 15:37:14] Checking for Bowtie

Bowtie version:

[2017-08-03 15:37:14] Checking for Bowtie index files (genome)..

[2017-08-03 15:37:14] Checking for reference FASTA file

[2017-08-03 15:37:14] Generating SAM header for BOWTIE2_INDEXES/capsicumgenome2

[2017-08-03 15:37:19] Reading known junctions from GTF file

[2017-08-03 15:37:20] Preparing reads

left reads: min. length=12, max. length=111, 19832519 kept reads (5687 discarded)

Warning: short reads (<20bp) will make TopHat quite slow and take large amount of memory because they are likely to be mapped in too many places

[2017-08-03 15:45:29] Building transcriptome data files ./tophat_out/tmp/CaZL1

[2017-08-03 15:46:02] Building Bowtie index from CaZL1.fa


Error: Couldn't build bowtie index with err = 1

I think the gff file of my genome is not optimum here to run. The gff file I used is downloaded from the below link, and the name of file is - Capsicum.annuum.L_Zunla-1_v2.0_genes.gff.gz

Can please someone help me to figure out what is exactly going wrong??

Thank you very much in advance.

gff tophat bowtie rna-seq • 57 views
ADD COMMENTlink modified 9 days ago by Jennifer Hillman Jackson22k • written 13 days ago by sarde2790
gravatar for Jennifer Hillman Jackson
9 days ago by
United States
Jennifer Hillman Jackson22k wrote:


The error is at the custom genome indexing step. Verify/adjust the format of the custom genome used for mapping. While doing that, also check for a reference genome mismatch problem (after reformatting the CG) between the GFF and CG chromosome identifiers. All inputs much be a match. And finally, sorting BAMs and sometimes the other inputs (GTF, etc) can help avoid resource problems, so try sorting if you run into more problems after fixing the above.

How-to is in these FAQs:

Also, Tophat has been deprecated. Please try using HISAT or RNASTAR instead.

Tutorials that include updated RNA-seq protocols:

Thanks, Jen, Galaxy team

ADD COMMENTlink written 9 days ago by Jennifer Hillman Jackson22k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 127 users visited in the last hour