Skip to content

Commit 52921cd

Browse files
authored
Copyedits for grammar and standardization
1 parent d2d6df6 commit 52921cd

File tree

1 file changed

+43
-46
lines changed

1 file changed

+43
-46
lines changed

docs/development/tutorial_add_executor.rst

Lines changed: 43 additions & 46 deletions
Original file line numberDiff line numberDiff line change
@@ -1,22 +1,22 @@
11
Adding an executor
22
==================
33

4-
This tutorial will write an executor for PBSPro using PSI/J batch scheduler
4+
This tutorial will write an executor for PBSPro using the PSI/J batch scheduler
55
executor interface.
66

77
It should be useful when writing an executor for any HPC style scheduler
88
that looks like SLURM or PBSPro.
99

1010

11-
What is an executor and why might you want to add one?
11+
What Is an Executor and Why Might You Want to Add One?
1212
------------------------------------------------------
1313

1414
PSI/J provides a common interface for obtaining allocations on compute resources.
1515

1616
Usually, those compute resources will already have some batch scheduler in place (for example, SLURM).
1717

1818
A PSI/J executor is the code that tells the core of PSI/J how to interact with
19-
such a batch scheduler, so that it can provide a common interface to applications.
19+
such a batch scheduler so that it can provide a common interface to applications.
2020

2121
A PSI/J executor needs to implement the abstract methods defined on the :class:`psij.job_executor.JobExecutor` base class.
2222
The documentation for that class has reference material for each of the methods that won't be repeated here.
@@ -25,11 +25,11 @@ For batch scheduler systems, the :class:`.BatchSchedulerExecutor` subclass provi
2525
This tutorial will focus on using BatchSchedulerExecutor as a base, rather than implementing JobExecutor directly.
2626

2727
The batch scheduler executor is based around a model where interactions with a local resource manager happen via command line invocations.
28-
For example, with PBS, that `qsub` and `qstat` commands are used to submit a request and to see status.
28+
For example, with PBS `qsub` and `qstat` commands are used to submit a request and to see status.
2929

30-
To use BatchSchedulerExecutor for a new local resource manager that uses this command line interface, subclass BatchSchedulerExecutor and add in code that understands how to form the command lines necessary to submit a request for an allocation and to get allocation status. This tutorial will do that for PBSPro.
30+
To use BatchSchedulerExecutor for a new local resource manager that uses this command line interface, use subclass BatchSchedulerExecutor and add in code that understands how to form the command lines necessary to submit a request for an allocation and to get allocation status. This tutorial will do that for PBSPro.
3131

32-
First setup a directory structure::
32+
First set up a directory structure::
3333

3434
mkdir project/
3535
cd project/
@@ -38,36 +38,36 @@ First setup a directory structure::
3838

3939
We're going to create three source files in this directory structure:
4040

41-
* ``psijpbs/pbspro.py`` - this will contain the bulk of the code
41+
* ``psijpbs/pbspro.py`` - This will contain the bulk of the code.
4242

43-
* ``psijpbs/pbspro.mustace`` - this will contain a template for a PBS Pro job submission file
43+
* ``psijpbs/pbspro.mustace`` - This will contain a template for a PBS Pro job submission file.
4444

45-
* ``psij-descriptors/pbspro_descriptor.py`` - this file tells the PSI/J core what this package implements.
45+
* ``psij-descriptors/pbspro_descriptor.py`` - This file tells the PSI/J core what this package implements.
4646

4747
First, we'll build a skeleton that won't work, and see that it doesn't work in the test suite. Then we'll build up to the full functionality.
4848

4949
Prerequisites:
5050

51-
* you have the psij-python package installed already and are able to run whatever basic verification you think is necessary
51+
* You have the psij-python package installed already and are able to run whatever basic verification you think is necessary.
5252

53-
* you are able to submit to PBS Pro on a local system
53+
* You are able to submit to PBS Pro on a local system.
5454

5555

56-
A not-implemented stub
56+
A Not-implemented Stub
5757
----------------------
5858

59-
Add the project directory to the python path directory::
59+
Add the project directory to the Python path directory::
6060

6161
export PYTHONPATH=$(pwd):$PYTHONPATH
6262

63-
Create a simple BatchSchedulerExecutor subclass that does nothing new, in `psijpbs/pbspro.py`::
63+
Create a simple BatchSchedulerExecutor subclass that does nothing new in `psijpbs/pbspro.py`::
6464

6565
from psij.executors.batch.batch_scheduler_executor import BatchSchedulerExecutor
6666

6767
class PBSProJobExecutor(BatchSchedulerExecutor):
6868
pass
6969

70-
and create a descriptor file to tell psi/j about this, ``psij-descriptors/pbspro.py``::
70+
and create a descriptor file to tell PSI/J about this, ``psij-descriptors/pbspro.py``::
7171

7272
from distutils.version import StrictVersion
7373

@@ -85,24 +85,24 @@ Now, run the test suite. It should fail with an error reporting that the resourc
8585

8686
That error message tells us what we need to implement. There are three broad pieces of functionality:
8787

88-
* submitting a job::
88+
* Submitting a job::
8989

9090
generate_submit_script
9191
get_submit_command
9292
job_id_from_submit_output
9393

94-
* requesting job status::
94+
* Requesting job status::
9595

9696
get_status_command
9797
parse_status_output
9898

99-
* cancelling a job::
99+
* Cancelling a job::
100100

101101
get_cancel_command
102102
process_cancel_command_output
103103

104104

105-
Let's implement all of these with stubs that return NotImplementedError that we will then flesh out::
105+
Let's implement all of these with stubs that return a NotImplementedError that we will then flesh out::
106106

107107
class PBSProJobExecutor(BatchSchedulerExecutor):
108108

@@ -127,15 +127,13 @@ Let's implement all of these with stubs that return NotImplementedError that we
127127
def parse_status_output(*args, **kwargs):
128128
raise NotImplementedError
129129

130-
Now running the same pytest command will give a different error - further along into attempting to submit a job:
131-
132-
... ::
130+
Now running the same pytest command will give a different error further along into attempting to submit a job::
133131

134132
> assert config
135133
E AssertionError
136134

137135

138-
This default BatchSchedulerExecutor code needs a configuration object, and none was supplied.
136+
This default BatchSchedulerExecutor code needs a configuration object and none was supplied.
139137

140138
A configuration object can contain configuration specific to this particular executor. However,
141139
for now we are not going to specify a custom configuration object and instead will re-use
@@ -163,27 +161,27 @@ Running pytest again, we get as far as seeing PSI/J is trying to do submit-relat
163161

164162
../tutorial-play/psijpbs/pbspro.py:13: NotImplementedError
165163

166-
Implementing job submission
164+
Implementing Job Submission
167165
---------------------------
168166

169-
To implement submission, we need to implement these three methods:
167+
To implement submission, we need to implement three methods:
170168

171169
* :py:meth:`psij.executors.batch.batch_scheduler_executor.BatchSchedulerExecutor.generate_submit_script`
172170
* :py:meth:`psij.executors.batch.batch_scheduler_executor.BatchSchedulerExecutor.get_submit_command`
173171
* :py:meth:`psij.executors.batch.batch_scheduler_executor.BatchSchedulerExecutor.job_id_from_submit_output`
174172

175173
You can read the docstrings for each of these methods for more information, but briefly the submission process is:
176174

177-
``generate_submit_script`` should generate a submit script specific to the batch scheduler.
175+
1. ``generate_submit_script`` should generate a submit script specific to the batch scheduler.
178176

179-
``get_submit_command`` should return the command line necessary to submit that script to the batch scheduler.
177+
2. ``get_submit_command`` should return the command line necessary to submit that script to the batch scheduler.
180178

181179
The output of that command should be interpreted by ``job_id_from_submit_output`` to extract a batch scheduler specific job ID,
182180
which can be used later when cancelling a job or getting job status.
183181

184182
So let's implement those.
185183

186-
In line with other PSI/J executors, we're going to delegate script generation to a template based helper. So add a line to initialise a :py:class:`.TemplatedScriptGenerator` in the
184+
In line with other PSI/J executors, we're going to delegate script generation to a template based helper. So add a line to initialize a :py:class:`.TemplatedScriptGenerator` in the
187185
executor initializer, pointing at a (as yet non-existent) template file, and replace ``generate_submit_script`` with a delegated call to `TemplatedScriptGenerator`::
188186

189187
from pathlib import Path
@@ -272,17 +270,17 @@ In the PBS Pro case, as shown in the example above, that is pretty straightforwa
272270
return out.strip()
273271

274272

275-
That's enough to get jobs submitted using PSI/J, but not enough to run the test suite. Instead, the test suite will appear to hang, because the PSI/J core code gets a bit upset by status monitoring methods raising NotImplementedError.
273+
That's enough to get jobs submitted using PSI/J, but not enough to run the test suite. Instead, the test suite will appear to hang, because the PSI/J core code gets a bit upset by status monitoring methods raising a NotImplementedError.
276274

277275

278-
Implementing status
276+
Implementing Status
279277
-------------------
280278

281-
PSI/J needs to ask the batch scheduler for status about jobs that it has submitted. This can be done with ``BatchSchedulerExecutor`` by overriding these two methods, which we stubbed out as not-implemented earlier on:
279+
PSI/J needs to ask the batch scheduler for the status of jobs that it has submitted. This can be done with ``BatchSchedulerExecutor`` by overriding these two methods, which we stubbed out as not-implemented earlier on:
282280

283-
* :py:meth:`.BatchSchedulerExecutor.get_status_command` - like ``get_submit_command``, this should return a batch scheduler specific commandline, this time to output job status.
281+
* :py:meth:`.BatchSchedulerExecutor.get_status_command` - Like ``get_submit_command``, this should return a batch scheduler specific command line, this time to output job status.
284282

285-
* :py:meth:`.BatchSchedulerExecutor.parse_status_output` - this will interpret the output of the above status command, a bit like ``job_id_from_submit_output``.
283+
* :py:meth:`.BatchSchedulerExecutor.parse_status_output` - This will interpret the output of the above status command, a bit like ``job_id_from_submit_output``.
286284

287285
Here's an implementation for ``get_status_command``::
288286

@@ -296,7 +294,7 @@ This constructs a command line which looks something like this::
296294

297295
qstat -f -F json -x 2154.edtb-01.mcp.alcf.anl.gov
298296

299-
The parameters change the default behaviour of ``qstat`` to something more useful for parsing: ``-f`` asks for full output, with `-x` including information for completed jobs (which is normally suppressed) and ``-F json`` asking for the output to be formatted as JSON (rather than a default text tabular view).
297+
The parameters change the default behavior of ``qstat`` to something more useful for parsing: ``-f`` asks for full output, with `-x` including information for completed jobs (which is normally suppressed) and ``-F json`` asking for the output to be formatted as JSON (rather than a default text tabular view).
300298

301299
This JSON output, which is passed to ``parse_status_output`` looks something like this (with a lot of detail removed)::
302300

@@ -350,28 +348,28 @@ We still haven't implemented the cancel methods, though. That will be revealed b
350348

351349
PYTHONPATH=$PWD/src:$PYTHONPATH pytest 'tests' --executors=pbspro
352350

353-
which should give this error (amongst others -- this commandline formation is ugly and I'd like it to work more along the lines of `make test`)::
351+
which should give this error (amongst othersthis commandline formation is ugly and I'd like it to work more along the lines of `make test`)::
354352

355353
FAILED tests/test_executor.py::test_cancel[pbspro] - NotImplementedError
356354

357-
Implementing cancel
355+
Implementing Cancel
358356
-------------------
359357

360358
The two methods to implement for cancellation follow the same pattern as for submission and status:
361359

362-
* :py:meth:`.BatchSchedulerExecutor.get_cancel_command` - this should form a command for cancelling a job.
363-
* :py:meth:`.BatchSchedulerExecutor.process_cancel_command_output` - this should interpret the output from the cancel command.
360+
* :py:meth:`.BatchSchedulerExecutor.get_cancel_command` - This should form a command for cancelling a job.
361+
* :py:meth:`.BatchSchedulerExecutor.process_cancel_command_output` - This should interpret the output from the cancel command.
364362

365-
It looks like you don't actually need to implement process_cancel_command_output beyond the stub we already have, to make the abstract class mechanism happy. Maybe that's something that should change in psi/j?
363+
It looks like you don't actually need to implement `process_cancel_command_output` beyond the stub we already have, to make the abstract class mechanism happy. Maybe that's something that should change in psi/j?
366364

367365
Here's an implementation of `get_cancel_command`::
368366

369367
def get_cancel_command(self, native_id: str) -> List[str]:
370368
return ['qdel', native_id]
371369

372-
That's enough to tell PBS Pro how to cancel a job, but it isn't enough for PSI/J to know that a job was actually cancelled: the JobState from `parse_status_output` will still return a state of COMPLETED, when we actually want CANCELED. That's because the existing job marks a job as COMPLETED whenever it reaches PBS Pro state `F` - no matter how the job finished.
370+
That's enough to tell PBS Pro how to cancel a job, but it isn't enough for PSI/J to know that a job was actually cancelled: the JobState from `parse_status_output` will still return a state of COMPLETED, when we actually want CANCELED. That's because the existing job marks a job as COMPLETED whenever it reaches PBS Pro state `F`no matter how the job finished.
373371

374-
So here's an updated `parse_status_output` which checks the ``Exit_status`` field in the qstat JSON to see if it exited with status code 265 - that means that the job was killed with signal 9. and if so, marks the job as CANCELED instead of completed::
372+
So here's an updated `parse_status_output` which checks the ``Exit_status`` field in the qstat JSON to see if it exited with status code 265that means that the job was killed with signal 9. and if so, marks the job as CANCELED instead of COMPLETED::
375373

376374
def parse_status_output(self, exit_code: int, out: str) -> Dict[str, JobStatus]:
377375
check_status_exit_code('qstat', exit_code, out)
@@ -399,19 +397,18 @@ This isn't necessarily the right thing to do: some PBS installs will use 128+9 =
399397

400398

401399

402-
What's missing?
400+
What's Missing?
403401
---------------
404402

405403
The biggest thing that was omitted was in the mustache template. A :py:class:`psij.Job` object contains lots of options which could be transcribed into the template (otherwise they will be ignored). Have a look at the docstrings for ``Job`` and at other templates in the PSI/J source code for examples.
406404

407405
The _STATE_MAP given here is also not exhaustive: if PBS Pro qstat returns a different state for a job than what is in it, this will break. So make sure you deal with all the states of your batch scheduler, not just a few that seem obvious.
408406

409-
How to distribute your executor
407+
How to Distribute Your Executor
410408
-------------------------------
411409

412410
If you want to share your executor with others, here are two ways:
413411

414-
i) you can make a python package and distribute that as an add-on without needing to interact with the psi/j project
415-
416-
ii) you can make a pull request against the psi/j repo
412+
1. You can make a Python package and distribute that as an add-on without needing to interact with the PSI/J project.
417413

414+
2. You can make a pull request against the PSI/J repo.

0 commit comments

Comments
 (0)