Galaxy PBS system

Question: Galaxy PBS system

3.8 years ago by

Netherlands

I am currently setting up a PBS system to work with Galaxy.

When submitting a job I see the following through the galaxy debug screen and eventually returning with an error stated within galaxy. As qstat states that the jobs went fine I am uncertain what is going wrong here.

Also it states: /galaxy/galaxy-dist/database/job_working_directory/000/27 but when I look I only see /1 or /2 and not /27...

Job output not returned by PBS: the output datasets were deleted while the job was running, the job was manually dequeued or there was a cluster error.

galaxy.jobs DEBUG 2015-02-16 07:29:46,527 (27) Working directory for job is: /galaxy/galaxy-dist/database/job_working_directory/000/27
galaxy.jobs.handler DEBUG 2015-02-16 07:29:46,558 (27) Dispatching to pbs runner
galaxy.jobs DEBUG 2015-02-16 07:29:47,912 (27) Persisting job destination (destination id: batch)
galaxy.jobs.handler INFO 2015-02-16 07:29:48,074 (27) Job dispatched
galaxy.jobs.runners DEBUG 2015-02-16 07:29:48,901 (27) command is: python3.4 /galaxy/galaxy-dist/tools/vaap/SAPP/1_conversion/gbktordf.py '-input' '/galaxy/galaxy-dist/database/files/000/dataset_14.dat' -output '/galaxy/galaxy-dist/database/files/000/dataset_27.dat' -sourcedb "genbank" -format "genbank"; return_code=$?; cd /galaxy/galaxy-dist; /galaxy/galaxy-dist/set_metadata.sh ./database/files /galaxy/galaxy-dist/database/job_working_directory/000/27 . /galaxy/galaxy-dist/config/galaxy.ini /galaxy/tmp/tmpLjCWqL /galaxy/galaxy-dist/database/job_working_directory/000/27/galaxy.json /galaxy/galaxy-dist/database/job_working_directory/000/27/metadata_in_HistoryDatasetAssociation_27_WAE9rW,/galaxy/galaxy-dist/database/job_working_directory/000/27/metadata_kwds_HistoryDatasetAssociation_27_JJRjQw,/galaxy/galaxy-dist/database/job_working_directory/000/27/metadata_out_HistoryDatasetAssociation_27_L1LeGv,/galaxy/galaxy-dist/database/job_working_directory/000/27/metadata_results_HistoryDatasetAssociation_27_sedwZX,,/galaxy/galaxy-dist/database/job_working_directory/000/27/metadata_override_HistoryDatasetAssociation_27_DaWNYB; sh -c "exit $return_code"
galaxy.jobs.runners.pbs DEBUG 2015-02-16 07:29:48,938 (27) submitting file /galaxy/galaxy-dist/database/pbs/27.sh
galaxy.jobs.runners.pbs DEBUG 2015-02-16 07:29:48,943 (27) queued in default queue as 24.micro1.wurnet.nl
galaxy.jobs DEBUG 2015-02-16 07:29:48,984 (27) Persisting job destination (destination id: batch)
galaxy.jobs.runners.pbs DEBUG 2015-02-16 07:29:51,152 (27/24.micro1.wurnet.nl) PBS job state changed from N to R
galaxy.jobs.runners.pbs DEBUG 2015-02-16 07:30:38,774 (27/24.micro1.wurnet.nl) PBS job state changed from R to C
galaxy.jobs.runners.pbs DEBUG 2015-02-16 07:30:38,774 (27/24.micro1.wurnet.nl) PBS job has completed successfully
galaxy.jobs.runners.pbs WARNING 2015-02-16 07:30:38,775 Exit code  was invalid. Using 0.
galaxy.jobs DEBUG 2015-02-16 07:30:38,900 setting dataset state to ERROR
galaxy.jobs DEBUG 2015-02-16 07:30:39,395 job 27 ended
galaxy.datatypes.metadata DEBUG 2015-02-16 07:30:39,395 Cleaning up external metadata files
galaxy.datatypes.metadata DEBUG 2015-02-16 07:30:39,437 Failed to cleanup MetadataTempFile temp files from /galaxy/galaxy-dist/database/job_working_directory/000/27/metadata_out_HistoryDatasetAssociation_27_L1LeGv: No JSON object could be decoded

grid pbs • 1.6k views

ADD COMMENT • link •

modified 3.8 years ago by jmchilton ♦ 1.1k • written 3.8 years ago by jasperkoehorst • 10

3.8 years ago by

jmchilton ♦ 1.1k

United States

jmchilton ♦ 1.1k wrote:

The reason you do not see job_working_directory files is because Galaxy is deleting them - you can set (cleanup_job = never) in your galaxy.ini (or universe_wsgi.ini for older set ups).

Beyond that - the version of the PBS library Galaxy currently leverages is known to fail with some newer variants of the DRM backend. Can you open the eggs.ini file in Galaxy's root directory and replace "pbs_python = 4.3.5" with "pbs_python = 4.4.0" and let me know if that fixes the problem?

ADD COMMENT • link written 3.8 years ago by jmchilton ♦ 1.1k

I indeed dont get the JSON error anymore, but still jobs are failing. The PBS was already set to pbs_python = 4.4.0:

See ini below:

[general]
repository = http://eggs.galaxyproject.org
; these eggs must be scrambled for your local environment
no_auto = pbs_python

[eggs:platform]
bx_python = 0.7.2
Cheetah = 2.2.2
MarkupSafe = 0.12
mercurial = 3.2.4
MySQL_python = 1.2.3c1
PyRods = 3.2.4
numpy = 1.6.0
pbs_python = 4.4.0
psycopg2 = 2.5.1
pycrypto = 2.5
pysam = 0.4.2
pysqlite = 2.5.6
python_lzo = 1.08_2.03_static
PyYAML = 3.10
guppy = 0.1.10
SQLAlchemy = 0.7.9
; msgpack_python = 0.2.4

ADD REPLY • link modified 3.8 years ago • written 3.8 years ago by jasperkoehorst • 10

Next thing I would try is to set 'retry_job_output_collection' to 4 instead of the default of 0 in galaxy.ini - this works around issues where Galaxy is responding too fast to jobs and network file system caching becomes a problem.

ADD REPLY • link written 3.8 years ago by jmchilton ♦ 1.1k

I even placed it at 40 and still no luck. However when I allow pbs to run on the master node I get python library issues that they cannot be found. For the applications in galaxy I use python3.4 and the command that is shown in debug mode works perfectly from the command line and also perfectly when I modify the job_conf.xml to local instead of pbs.

<?xml version="1.0"?>
<job_conf>
    <plugins>
        
        <plugin id="pbs" type="runner" load="galaxy.jobs.runners.pbs:PBSJobRunner"/>
    </plugins>
    <handlers>
        <handler id="main"/>
    </handlers>
    <destinations default="batch">
        
        <destination id="batch" runner="pbs"/>
    </destinations>
</job_conf>

ADD REPLY • link written 3.8 years ago by jasperkoehorst • 10

Similar posts • Search »