Question: Galaxy PBS system
0
gravatar for jasperkoehorst
3.3 years ago by
Netherlands
jasperkoehorst10 wrote:

I am currently setting up a PBS system to work with Galaxy.

When submitting a job I see the following through the galaxy debug screen and eventually returning with an error stated within galaxy. As qstat states that the jobs went fine I am uncertain what is going wrong here.

Also it states: /galaxy/galaxy-dist/database/job_working_directory/000/27 but when I look I only see /1 or /2 and not /27...


Job output not returned by PBS: the output datasets were deleted while the job was running, the job was manually dequeued or there was a cluster error.

galaxy.jobs DEBUG 2015-02-16 07:29:46,527 (27) Working directory for job is: /galaxy/galaxy-dist/database/job_working_directory/000/27
galaxy.jobs.handler DEBUG 2015-02-16 07:29:46,558 (27) Dispatching to pbs runner
galaxy.jobs DEBUG 2015-02-16 07:29:47,912 (27) Persisting job destination (destination id: batch)
galaxy.jobs.handler INFO 2015-02-16 07:29:48,074 (27) Job dispatched
galaxy.jobs.runners DEBUG 2015-02-16 07:29:48,901 (27) command is: python3.4 /galaxy/galaxy-dist/tools/vaap/SAPP/1_conversion/gbktordf.py '-input' '/galaxy/galaxy-dist/database/files/000/dataset_14.dat' -output '/galaxy/galaxy-dist/database/files/000/dataset_27.dat' -sourcedb "genbank" -format "genbank"; return_code=$?; cd /galaxy/galaxy-dist; /galaxy/galaxy-dist/set_metadata.sh ./database/files /galaxy/galaxy-dist/database/job_working_directory/000/27 . /galaxy/galaxy-dist/config/galaxy.ini /galaxy/tmp/tmpLjCWqL /galaxy/galaxy-dist/database/job_working_directory/000/27/galaxy.json /galaxy/galaxy-dist/database/job_working_directory/000/27/metadata_in_HistoryDatasetAssociation_27_WAE9rW,/galaxy/galaxy-dist/database/job_working_directory/000/27/metadata_kwds_HistoryDatasetAssociation_27_JJRjQw,/galaxy/galaxy-dist/database/job_working_directory/000/27/metadata_out_HistoryDatasetAssociation_27_L1LeGv,/galaxy/galaxy-dist/database/job_working_directory/000/27/metadata_results_HistoryDatasetAssociation_27_sedwZX,,/galaxy/galaxy-dist/database/job_working_directory/000/27/metadata_override_HistoryDatasetAssociation_27_DaWNYB; sh -c "exit $return_code"
galaxy.jobs.runners.pbs DEBUG 2015-02-16 07:29:48,938 (27) submitting file /galaxy/galaxy-dist/database/pbs/27.sh
galaxy.jobs.runners.pbs DEBUG 2015-02-16 07:29:48,943 (27) queued in default queue as 24.micro1.wurnet.nl
galaxy.jobs DEBUG 2015-02-16 07:29:48,984 (27) Persisting job destination (destination id: batch)
galaxy.jobs.runners.pbs DEBUG 2015-02-16 07:29:51,152 (27/24.micro1.wurnet.nl) PBS job state changed from N to R
galaxy.jobs.runners.pbs DEBUG 2015-02-16 07:30:38,774 (27/24.micro1.wurnet.nl) PBS job state changed from R to C
galaxy.jobs.runners.pbs DEBUG 2015-02-16 07:30:38,774 (27/24.micro1.wurnet.nl) PBS job has completed successfully
galaxy.jobs.runners.pbs WARNING 2015-02-16 07:30:38,775 Exit code  was invalid. Using 0.
galaxy.jobs DEBUG 2015-02-16 07:30:38,900 setting dataset state to ERROR
galaxy.jobs DEBUG 2015-02-16 07:30:39,395 job 27 ended
galaxy.datatypes.metadata DEBUG 2015-02-16 07:30:39,395 Cleaning up external metadata files
galaxy.datatypes.metadata DEBUG 2015-02-16 07:30:39,437 Failed to cleanup MetadataTempFile temp files from /galaxy/galaxy-dist/database/job_working_directory/000/27/metadata_out_HistoryDatasetAssociation_27_L1LeGv: No JSON object could be decoded




 

grid pbs • 1.4k views
ADD COMMENTlink modified 3.2 years ago by jmchilton1.1k • written 3.3 years ago by jasperkoehorst10
2
gravatar for jmchilton
3.2 years ago by
jmchilton1.1k
United States
jmchilton1.1k wrote:

The reason you do not see job_working_directory files is because Galaxy is deleting them - you can set (cleanup_job = never) in your galaxy.ini (or universe_wsgi.ini for older set ups).

Beyond that - the version of the PBS library Galaxy currently leverages is known to fail with some newer variants of the DRM backend. Can you open the eggs.ini file in Galaxy's root directory and replace "pbs_python = 4.3.5" with "pbs_python = 4.4.0" and let me know if that fixes the problem?

ADD COMMENTlink written 3.2 years ago by jmchilton1.1k

I indeed dont get the JSON error anymore, but still jobs are failing. The PBS was already set to pbs_python = 4.4.0:

See ini below:

[general]
repository = http://eggs.galaxyproject.org
; these eggs must be scrambled for your local environment
no_auto = pbs_python

[eggs:platform]
bx_python = 0.7.2
Cheetah = 2.2.2
MarkupSafe = 0.12
mercurial = 3.2.4
MySQL_python = 1.2.3c1
PyRods = 3.2.4
numpy = 1.6.0
pbs_python = 4.4.0
psycopg2 = 2.5.1
pycrypto = 2.5
pysam = 0.4.2
pysqlite = 2.5.6
python_lzo = 1.08_2.03_static
PyYAML = 3.10
guppy = 0.1.10
SQLAlchemy = 0.7.9
; msgpack_python = 0.2.4

ADD REPLYlink modified 3.2 years ago • written 3.2 years ago by jasperkoehorst10

Next thing I would try is to set 'retry_job_output_collection' to 4 instead of the default of 0 in galaxy.ini - this works around issues where Galaxy is responding too fast to jobs and network file system caching becomes a problem.

 

ADD REPLYlink written 3.2 years ago by jmchilton1.1k

I even placed it at 40 and still no luck. However when I allow pbs to run on the master node I get python library issues that they cannot be found. For the applications in galaxy I use python3.4 and the command that is shown in debug mode works perfectly from the command line and also perfectly when I modify the job_conf.xml to local instead of pbs.

 

<?xml version="1.0"?>
<job_conf>
    <plugins>
        <!-- <plugin id="local" type="runner" load="galaxy.jobs.runners.local:LocalJobRunner" workers="4"/> -->
        <plugin id="pbs" type="runner" load="galaxy.jobs.runners.pbs:PBSJobRunner"/>
    </plugins>
    <handlers>
        <handler id="main"/>
    </handlers>
    <destinations default="batch">
        <!-- <destination id="local" runner="local"/> -->
        <destination id="batch" runner="pbs"/>
    </destinations>
</job_conf>

 

ADD REPLYlink written 3.2 years ago by jasperkoehorst10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 73 users visited in the last hour