Tophat jobs exceeding maximum allowed job run time

Question: Tophat jobs exceeding maximum allowed job run time

4.2 years ago by

European Union

vieraldo • 10 wrote:

Hello,

I have a question regarding the mapping of RNA-seq data. I am hitting a wall, because some of my jobs keep getting terminated due to run time.

“This job was terminated because it ran longer than the maximum allowed job run time.”

I am using the default settings of Tophat (for Illumina) and the FastQ files are in the 4-7 Gb range. I get this error after about 48h of jobtime. I wonder if there is any way to circumvent this (like extending the allowed run time) that does not involve installing Galaxy locally.

Thanks for your help!

Karel

rna-seq tophat galaxy • 1.5k views

ADD COMMENT • link •

modified 4.2 years ago • written 4.2 years ago by vieraldo • 10

4.2 years ago by

Nate Coraor ♦ 3.2k

United States

Nate Coraor ♦ 3.2k wrote:

Hi Karel,

As you've probably noticed from the wait times for your tophat jobs, there is a lot of contention for CPU time in the queue we have reserved for these types of jobs. Increasing the walltime is unfortunately not possible without significantly impacting the experience of other Galaxy users.

That said, there may be changes to parameters that you can use that will decrease the runtime, hopefully someone with a bit more knowledge of the science will reply to help with that.

--nate

ADD COMMENT • link written 4.2 years ago by Nate Coraor ♦ 3.2k

Hi Karel,

If you want to send in one of the failed runs as a bug report, including link to this Biostar post in the comments, I can take a look at params/inputs and offer advice. Please try to leave at least one complete analysis thread undeleted until I reply (will do best to be quick).

Thanks, Jen, Galaxy team

ADD REPLY • link written 4.2 years ago by Jennifer Hillman Jackson ♦ 25k

4.2 years ago by

vieraldo • 10

European Union

vieraldo • 10 wrote:

Hello Jennifer,

Thanks for offering to help us. I sent you the bug report, it's from a galaxy account belonging to 1 of my colleagues (account email: s.stefanovic@amc.uva.nl). It's actually her data I am processing (or rather trying to process).

Thanks,

Karel

ADD COMMENT • link modified 4.2 years ago • written 4.2 years ago by vieraldo • 10

Hi - thanks for sending in the bug report. It definitely helped to isolate a few problems.

The issue with the most recent Tophat errors have to do with the format of the input datasets. They are missing the first identifier line. Start off like this:

+
>=>;@@@=@6>?=@@>@2?@4@.>?@@2@@@638@@@=/<@@/<@@@68>
@547_171_4936
GACCCCGCAACAGAAAGAGGGGTTAATCGCGTCTAGGGTCTTAGGGGATG
+
36-*-*@@747><?00@?-*2:/4.3@--.@@5?.<8**--6*3**407*

The one in the bug report and other right before it have input sequences that are nearly all or close to half ambiguous base calls. Like this:

@1_20_951
CCCNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
+
8@<!/!=>!@!@@!@!/?!@!@/!@!@!!@!@!!=!@!!2!@!!!!@!!!

These won't directly cause a problem (except to maybe extend run time, one the errors you encountered), but the content will interfere with correct mapping due to the short length. If none map, or mapping is skewed to only properly consider sequence above a certain length (an arbitrary length) that will be a problem. Parameters in Tophat rely on length settings to match the data. In particular, the parameter "Minimum length of read segments:" has a default of "25". This requires that the mappable portion of reads is twice that number, so "50" bases. You used the default with sequences just at 50, but the ambiguous content needs to be subtracted to obtain the real mappable length. The shortest you can go is probably somewhere around 30 - I don't know for certain what the lower limit will be before a time-out occurs when using the public Main server. You'll have to test.

Try using a clipper or trim tool, then filter for sequencing artifacts. The tool "FastQC" is a good one to get some basic statistics about sequence quality and length. You can find these tools in the group "NGS: QC and manipulation".

I see that you have other runs going, also at default. The input dataset at first glance is 50 bases and seems OK. Run FastQC on it, then perform some QC as needed and adjust the Tophat parameters, if you have problems/low mapping rates.

Hopefully this helps and thanks for your patience, Jen, Galaxy team

ADD REPLY • link modified 4.2 years ago • written 4.2 years ago by Jennifer Hillman Jackson ♦ 25k

Please log in to add an answer.

Similar posts • Search »