Question: TopHat error only on one dataset?
0
gravatar for Brian Griffiths
2.6 years ago by
United States
Brian Griffiths20 wrote:

Hi all,

I've run my RNA-Seq data through tophat for the rn5 genome and it worked as expected.

I wanted to compare it to some old data, and so I tried realigning to the rn4 genome. However, one of my samples failed. I tried rerunning it, and it failed again. I reset the computer, started galaxy and ran it without touching anything else and it failed again.

It's the same sample (I’ve imported it from shared data) that I ran last time. All 11 other samples worked for the rn5 alignment.

The error doesn't seem to be in the analysis, but in the reporting. The error I get in galaxy is:

[...] [2016-05-06 05:24:38] Indexing splicesBuilding a SMALL index

[2016-05-06 05:25:57] Mapping left_kept_reads_seg1 to genome segment_juncs with Bowtie2 (1/4)

[2016-05-06 05:30:45] Mapping left_kept_reads_seg2 to genome segment_juncs with Bowtie2 (2/4)

[2016-05-06 05:34:57] Mapping left_kept_reads_seg3 to genome segment_juncs with Bowtie2 (3/4)

[2016-05-06 05:39:13] Mapping left_kept_reads_seg4 to genome segment_juncs with Bowtie2 (4/4)

[2016-05-06 05:44:50] Joining segment hits

[2016-05-06 05:53:49] Mapping right_kept_reads_seg1 to genome segment_juncs with Bowtie2 (1/4)

[2016-05-06 05:57:45] Mapping right_kept_reads_seg2 to genome segment_juncs with Bowtie2 (2/4)

[2016-05-06 06:01:53] Mapping right_kept_reads_seg3 to genome segment_juncs with Bowtie2 (3/4)

[2016-05-06 06:06:12] Mapping right_kept_reads_seg4 to genome segment_juncs with Bowtie2 (4/4)

[2016-05-06 06:12:44] Joining segment hits

[2016-05-06 06:22:49] Reporting output tracks

[FAILED]

Error running /home/user/galaxy/toolbox/tophat/2.0.14/iuc/package_tophat_2_0_14/536f7bb5616d/bin/tophat_reports --min-anchor 8 --splice-mismatches 0 --min-report-intron 70 --max-report-intron 500000 --min-isoform-fraction 0.15 --output-dir ./tophat_out/ --max-multihits 20 --max-seg-multihits 40 --segment-length 25 --segment-mismatches 2 --min-closure-exon 100 --min-closure-intron 50 --max-closure-intron 5000 --min-coverage-intron 50 --max-coverage-intron 20000 --min-segment-intron 50 --max-segment-intron 500000 --read-mismatches 2 --read-gap-length 2 --read-edit-dist 2 --read-realign-edit-dist 1000 --max-insertion-length 3 --max-deletion-length 3 -z gzip --inner-dist-mean 300 --inner-dist-std-dev 20 --no-closure-search --no-coverage-search --no-microexon-search --library-type fr-unstranded --sam-header ./tophat_out/tmp/rn4_genome.bwt.samheader.sam --report-mixed-alignments --samtools=/home/user/galaxy/toolbox/tophat/2.0.14/iuc/package_tophat_2_0_14/536f7bb5616d/bin/samtools_0.1.18 --bowtie2-max-penalty 6 --bowtie2-min-penalty 2 --bowtie2-penalty-for-N 1 --bowtie2-read-gap-open 5 --bowtie2-read-gap-cont 3 --bowtie2-ref-gap-open 5 --bowtie2-ref-gap-cont 3 /home/user/galaxy/tool-data/rn4/bowtie2_index/rn4/rn4.fa ./tophat_out/junctions.bed ./tophat_out/insertions.bed ./tophat_out/deletions.bed ./tophat_out/fusions.out ./tophat_out/tmp/accepted_hits ./tophat_out/tmp/left_kept_reads.mapped.bam,./tophat_out/tmp/left_kept_reads.candidates.bam ./tophat_out/tmp/left_kept_reads.bam ./tophat_out/tmp/right_kept_reads.mapped.bam,./tophat_out/tmp/right_kept_reads.candidates.bam ./tophat_out/tmp/right_kept_reads.bam

Loaded 201494 junctions

Which doesn't really tell me much. In the terminal it shows:

galaxy.jobs.runners.local DEBUG 2016-05-06 06:51:42,725 execution finished: /media/user/Storage/galaxy/job_working_directory/000/533/galaxy_533.sh

galaxy.jobs.output_checker INFO 2016-05-06 06:51:42,767 Job 533: Fatal error: Tool execution failed

galaxy.jobs DEBUG 2016-05-06 06:51:42,807 setting dataset state to ERROR

galaxy.jobs DEBUG 2016-05-06 06:51:42,825 setting dataset state to ERROR

galaxy.jobs DEBUG 2016-05-06 06:51:42,840 setting dataset state to ERROR

galaxy.jobs DEBUG 2016-05-06 06:51:42,854 setting dataset state to ERROR

galaxy.jobs DEBUG 2016-05-06 06:51:42,867 setting dataset state to ERROR

galaxy.jobs INFO 2016-05-06 06:51:43,010 Collecting job metrics for <galaxy.model.job object="" at="" 0x7f8974703150="">

galaxy.jobs DEBUG 2016-05-06 06:51:43,023 job 533 ended (finish() executed in (297.702 ms))

galaxy.datatypes.metadata DEBUG 2016-05-06 06:51:43,024 Cleaning up external metadata files

galaxy.datatypes.metadata DEBUG 2016-05-06 06:51:43,053 Failed to cleanup MetadataTempFile temp files from /media/user/Storage/galaxy/job_working_directory/000/533/metadata_out_HistoryDatasetAssociation_1874_EMhqTG: No JSON object could be decoded

I have more than enough RAM (125 GB), more than enough HD space (500 GB) and nothing else is going on during this time. If anybody has any insight, or additional places I could look to hunt down this error, I'd appreciate it greatly!

Thanks!

tophat • 1.0k views
ADD COMMENTlink modified 2.6 years ago by ablanchetcohen10 • written 2.6 years ago by Brian Griffiths20

This is a local instance, so doesn't have anything to do with usegalaxy.org's recent problems.

ADD REPLYlink written 2.6 years ago by Brian Griffiths20
1
gravatar for ablanchetcohen
2.6 years ago by
ablanchetcohen10 wrote:

I had the same issue running TopHat command line. I just updated TopHat to the latest version, 2.1.1, and the problem disapeared. Incidentally, the TopHat developpers have now themselves declared TopHat obsolete, so any issue with TopHat is now irrelevant.

ADD COMMENTlink modified 2.6 years ago • written 2.6 years ago by ablanchetcohen10

Thanks for the feedback. I've made a request to upgrade Tophat - either as part of the change linked above or as a distinct point change.

HISAT2 is available at http://usegalaxy.org as an alternative.

ADD REPLYlink written 2.6 years ago by Jennifer Hillman Jackson25k
1

Update: I ran a test with a very large dataset that produced this exact same error (in Tophat) using HISAT2 instead. Success!

Tophat will be updated to v 2.1.0 shortly. When promoted to http://usegalaxy.org, I will test to see if that wrapped version also avoids the issue. This ticket https://github.com/galaxyproject/tools-devteam/issues/365 will have the results of that test posted before closing it out.

ADD REPLYlink written 2.6 years ago by Jennifer Hillman Jackson25k
0
gravatar for Jennifer Hillman Jackson
2.6 years ago by
United States
Jennifer Hillman Jackson25k wrote:

Hello,

This error comes up every so often with certain input combinations. The exact cause is not known, but we are fairly certain at this point that the issue is with Tophat itself.

Anecdotally, the problem seems to be related to the identification of a very large number of splices during processing (due to a fragmented reference genome, deep coverage by reads, etc). With many tests, I have been unable to resolve the issue for test cases by adjusting a variety of parameters.

As an example, this is one false lead toward solving the issue (meaning, a form issue was found but it did not fix this specific problem - but more testing is still ongoing): https://github.com/galaxyproject/tools-devteam/issues/365

If anyone knows of a solution, please comment. Adjusting the allocated number of threads (as previously posted as a solution in another post) was not found to be effective.

Thanks for posting about the problem, Jen, Galaxy team

ADD COMMENTlink written 2.6 years ago by Jennifer Hillman Jackson25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 169 users visited in the last hour