GFF format not working with tophat

Question: GFF format not working with tophat

16 months ago by

sarde279 • 0 wrote:

I am working with pepper genome Capsicum annuum zunla variety (http://peppersequence.genomics.cn/page/species/download.jsp). I was aligning the RNA-seq reads to this genome with the following command,

tophat2 -p 6 --min-intron-length 40 --bowtie-n --no-novel-juncs --b2-very-sensitive -G /Volumes/Pepper_Tosh/TOPHAT/CaZL1.gff BOWTIE2_INDEXES/capsicumgenome2 PT_8h_5_S30_combined_R1_001_trimmed.fastq

But, somehow I am getting the following error, [2017-08-03 15:37:14] Beginning TopHat run (v2.0.14)

[2017-08-03 15:37:14] Checking for Bowtie

Bowtie version: 2.2.3.0

[2017-08-03 15:37:14] Checking for Bowtie index files (genome)..

[2017-08-03 15:37:14] Checking for reference FASTA file

[2017-08-03 15:37:14] Generating SAM header for BOWTIE2_INDEXES/capsicumgenome2

[2017-08-03 15:37:19] Reading known junctions from GTF file

[2017-08-03 15:37:20] Preparing reads

left reads: min. length=12, max. length=111, 19832519 kept reads (5687 discarded)

Warning: short reads (<20bp) will make TopHat quite slow and take large amount of memory because they are likely to be mapped in too many places

[2017-08-03 15:45:29] Building transcriptome data files ./tophat_out/tmp/CaZL1

[2017-08-03 15:46:02] Building Bowtie index from CaZL1.fa

[FAILED]

Error: Couldn't build bowtie index with err = 1

I think the gff file of my genome is not optimum here to run. The gff file I used is downloaded from the below link,

http://peppersequence.genomics.cn/page/species/download.jsp and the name of file is - Capsicum.annuum.L_Zunla-1_v2.0_genes.gff.gz

Can please someone help me to figure out what is exactly going wrong??

Thank you very much in advance.

gff tophat bowtie rna-seq • 826 views

ADD COMMENT • link •

modified 15 months ago by Jennifer Hillman Jackson ♦ 25k • written 16 months ago by sarde279 • 0

15 months ago by

Jennifer Hillman Jackson ♦ 25k

United States

Jennifer Hillman Jackson ♦ 25k wrote:

Hello,

The error is at the custom genome indexing step. Verify/adjust the format of the custom genome used for mapping. While doing that, also check for a reference genome mismatch problem (after reformatting the CG) between the GFF and CG chromosome identifiers. All inputs much be a match. And finally, sorting BAMs and sometimes the other inputs (GTF, etc) can help avoid resource problems, so try sorting if you run into more problems after fixing the above.

How-to is in these FAQs: https://galaxyproject.org/support/#getting-inputs-right-

Also, Tophat has been deprecated. Please try using HISAT or RNASTAR instead.

Tutorials that include updated RNA-seq protocols: https://galaxyproject.org/learn/

Thanks, Jen, Galaxy team

ADD COMMENT • link written 15 months ago by Jennifer Hillman Jackson ♦ 25k

Similar posts • Search »