Skip to content

Commit 41c774c

Browse files
committed
response to review comments
1 parent b15369d commit 41c774c

File tree

2 files changed

+41
-62
lines changed

2 files changed

+41
-62
lines changed

docs/getting_started.rst

Lines changed: 8 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -37,19 +37,21 @@ Overview
3737

3838
In PSI/J's terminology, a :class:`Job <psij.job.Job>` represents an executable
3939
plus a bunch of attributes. Static job attributes such es resource requiremens
40-
are defined by the :class:`JobSpec <psij.job_spec.JobSpec>` at creation, dynamic
41-
job attributes such as the :class:`JobState <psij.job_state.JobState>` are
42-
updated by PSI/J at runtime.
40+
are defined by the :class:`JobSpec <psij.job_spec.JobSpec>` at
41+
creation. Dynamic job attributes such as the :class:`JobState
42+
<psij.job_state.JobState>` are modified by :class:`JobExecutors
43+
<psij.job_executor.JobExecutor>` as the :class:`Job <psij.job.Job>`
44+
progresses through its lifecycle.
4345

4446
A :class:`JobExecutor <psij.job_executor.JobExecutor>` represents a specific
4547
Resource Manager, e.g. Slurm, on which the Job is being executed. Generally,
4648
when jobs are submitted, they will be queued for a variable period of time,
4749
depending on how busy the target machine is. Once the Job is started, its
4850
executable is launched and runs to completion.
4951

50-
In PSI/J, a job is submitted by `JobExecutor.submit(Job)` which permanently
51-
binds the Job to that executor and submits it to the underlying resource
52-
manager.
52+
In PSI/J, a job is submitted by :meth:`JobExecutor.submit(Job)
53+
<psij.job_executor.JobExecutor.submit>` which permanently binds the Job to that
54+
executor and submits it to the underlying resource manager.
5355

5456

5557
Basic Usage

docs/user_guide.rst

Lines changed: 33 additions & 56 deletions
Original file line numberDiff line numberDiff line change
@@ -43,11 +43,13 @@ resource manager job. One :class:`Job <psij.job.Job>` instance might represent
4343
a Slurm job running on a LLNL cluster, another a Cobalt job running on ALCF's
4444
Theta, another a Flux job in the cloud, and so on.
4545

46-
A ``Job`` is described by an executable plus job attributes which specify how
47-
exactly the job is executed. Static job attributes such es resource requiremens
48-
are defined by the :class:`JobSpec <psij.job_spec.JobSpec>` at creation, dynamic
49-
job attributes such as the :class:`JobState <psij.job_state.JobState>` are
50-
updated by PSI/J at runtime.
46+
A :class:`Job <psij.job.Job>` is described by an executable plus job attributes which specify how
47+
exactly the job is executed. Static job attributes such as resource
48+
requirements are defined by the :class:`JobSpec <psij.job_spec.JobSpec>` at
49+
creation. Dynamic job attributes such as the :class:`JobState
50+
<psij.job_state.JobState>` are modified by :class:`JobExecutors
51+
<psij.job_executor.JobExecutor>` as the :class:`Job <psij.job.Job>`
52+
progresses through its lifecycle.
5153

5254

5355
What is a JobExecutor?
@@ -75,7 +77,7 @@ PSI/J currently provides executors for the following backends:
7577
- `slurm` : `Slurm Scheduling System <https://slurm.schedmd.com/>`_
7678
- `lsf` : `IBM Spectrum LSF <https://www.ibm.com/docs/en/spectrum-lsf>`_
7779
- `pbspro`: `Altair's PBS-Professional <https://www.altair.com/pbs-professional>`_
78-
- `cobalt`: ALCF's Cobalt job scheduler
80+
- `cobalt`: `ALCF's Cobalt job scheduler <https://www.alcf.anl.gov/support/user-guides/theta/queueing-and-running-jobs/job-and-queue-scheduling/index.html>`_
7981

8082
We encourage the contribution of executors for additional backends - please
8183
reference the `developers documentation
@@ -143,12 +145,12 @@ numbers of jobs (tested with up to 64k jobs).
143145
Configuring your Job
144146
--------------------
145147

146-
In the example above, the `executable='/bin/date'` part tells PSI/J that we want
147-
the job to run the `/bin/date` command. But there are other parts to the job
148+
In the example above, the ``executable='/bin/date'`` part tells PSI/J that we want
149+
the job to run the ``/bin/date`` command. But there are other parts to the job
148150
which can be configured:
149151

150152
- arguments for the job executable
151-
- environment the job is running it
153+
- environment the job is running in
152154
- destination for standard output and error streams
153155
- resource requirements for the job's execution
154156
- accounting details to be used
@@ -161,8 +163,8 @@ Job Arguments
161163
^^^^^^^^^^^^^
162164

163165
The executable's command line arguments to be used for a job are specified as
164-
a list of strings in the arguments attribute of the `JobSpec` class. For
165-
example, our previous `/bin/date` job could be changed to request UTC time
166+
a list of strings in the arguments attribute of the ``JobSpec`` class. For
167+
example, our previous ``/bin/date`` job could be changed to request UTC time
166168
formatting:
167169

168170
.. rst-class:: executor-type-selector selector-mode-tabs
@@ -174,7 +176,7 @@ Slurm // Local // LSF // PBS // Cobalt
174176
from psij import Job, JobExecutor, JobSpec
175177
176178
ex = JobExecutor.get_instance('<&executor-type>')
177-
job = Job(JobSpec(executable='/bin/date', arguments=['-u']))
179+
job = Job(JobSpec(executable='/bin/date', arguments=['-utc', '--debug']))
178180
ex.submit(job)
179181
180182
Note: `JobSpec` attributes can also be added incrementally:
@@ -191,10 +193,10 @@ Note: `JobSpec` attributes can also be added incrementally:
191193
Job Environment
192194
^^^^^^^^^^^^^^^
193195

194-
The job environment is provided a environment variables to the executing job
195-
- the are the equivalent of `export FOO=bar` on the shell command line. Those
196-
environment variables are specified as a dictionary of string-type key/value
197-
pairs:
196+
The job environment sets the environment variables for a job before it is
197+
launched. This is the equivalent of exporting ``FOO=bar`` on the command line
198+
before running a command. These environment variables are specified as
199+
a dictionary of string key/value pairs:
198200

199201
.. code-block:: python
200202
@@ -209,11 +211,11 @@ initialization files (`e.g., ~/.bashrc`), including from any modules loaded in
209211
the default shell environment.
210212

211213

212-
Job StdIO
214+
Job Stdio
213215
^^^^^^^^^
214216

215217
Standard output and standard error streams of the job can be individually
216-
redirected to files by setting the `stdout_path` and `stderr_path` attributes:
218+
redirected to files by setting the ``stdout_path`` and ``stderr_path`` attributes:
217219

218220
.. code-block:: python
219221
@@ -225,29 +227,30 @@ redirected to files by setting the `stdout_path` and `stderr_path` attributes:
225227
spec.stderr_path = '/tmp/date.err'
226228
227229
The job's standard input stream can also be redirected to read from a file, by
228-
setting the `spec.stdin_path` attribute.
230+
setting the ``spec.stdin_path`` attribute.
229231

230232

231233
Job Resources
232234
^^^^^^^^^^^^^
233235

234236
A job submitted to a cluster is allocated a specific set of resources to run on.
235237
The amount and type of resources are defined by a resource specification
236-
`psij.ResourceSpec` which becomes a part of the job specification. The resource specification supports the following attributes:
238+
``ResourceSpec`` which becomes a part of the job specification. The resource
239+
specification supports the following attributes:
237240

238-
- `node_count`: allocate that number of compute nodes to the job. All
241+
- ``node_count``: allocate that number of compute nodes to the job. All
239242
cpu-cores and gpu-cores on the allocated node can be exclusively used by the
240243
submitted job.
241-
- `processes_per_node`: on the allocated nodes, execute that given number of
244+
- ``processes_per_node``: on the allocated nodes, execute that given number of
242245
processes.
243-
- `process_count`: the total number of processes (ranks) to be started
244-
- `cpu_cores_per_process`: the number of cpu cores allocated to each launched
246+
- ``process_count``: the total number of processes (ranks) to be started
247+
- ``cpu_cores_per_process``: the number of cpu cores allocated to each launched
245248
process. PSI/J uses the system definition of a cpu core which may refer to
246-
a physical cpu core or to a virtual cpu core, aka. hardware thread.
247-
- `gpu_cores_per_process`: the number of gpu cores allocated to each launched
249+
a physical cpu core or to a virtual cpu core, also known as a hardware thread.
250+
- ``gpu_cores_per_process``: the number of gpu cores allocated to each launched
248251
process. The system definition of an gpu core is used, but usually refers
249252
to a full physical GPU.
250-
- `exclusive_node_use`: When this boolean flag is set to `True`, then PSI/J
253+
- ``exclusive_node_use``: When this boolean flag is set to ``True``, then PSI/J
251254
will ensure that no other jobs, neither of the same user nor of other users
252255
of the same system, will run on any of the compute nodes on which processes
253256
for this job are launched.
@@ -258,7 +261,7 @@ launched on a single cpu core.
258261

259262
The user should also take care not to define contradictory statements. For
260263
example, the following specification cannot be enacted by PSI/J as the specified
261-
node count contradicts the value of `process_count / processes_per_node`:
264+
node count contradicts the value of ``process_count / processes_per_node``:
262265

263266
.. code-block:: python
264267
@@ -268,6 +271,7 @@ node count contradicts the value of `process_count / processes_per_node`:
268271
spec.executable = '/bin/stress'
269272
spec.resource_spec = ResourceSpec(node_count=2, processes_per_node=2,
270273
process_count=2)
274+
# the line above should raise an 'psij.InvalidJobException' exception
271275
272276
273277
Processes versus ranks
@@ -295,7 +299,7 @@ Scheduling Information
295299
To specify resource-manager-specific information, like queues/partitions,
296300
runtime, and so on, create a :class:`JobAttributes
297301
<psij.job_attributes.JobAttributes>` and set it with ``JobSpec(...,
298-
attributes=my_job_attributes)``::
302+
attributes=my_job_attributes)``:
299303

300304
.. rst-class:: executor-type-selector selector-mode-tabs
301305

@@ -414,30 +418,3 @@ Slurm // Local // LSF // PBS // Cobalt
414418
415419
Status callbacks can also be set on individual jobs with
416420
:meth:`set_job_status_callback <psij.job.Job.set_job_status_callback>`.
417-
418-
419-
Running Psi/J at your site
420-
--------------------------
421-
422-
TODO: Pages should contain:
423-
424-
- A simple example ported to multiple sites showing how to configure PSI/J for
425-
each site with required configuration / attributes (with site-switcher?)
426-
(Each example should be in the test suite)
427-
- Common errors you might encounter
428-
- ‘If your site isn’t listed, please contact us to include it’
429-
430-
431-
Running at LLNL LC
432-
^^^^^^^^^^^^^^^^^^
433-
434-
Running at OLCF
435-
^^^^^^^^^^^^^^^
436-
437-
Running at NERSC
438-
^^^^^^^^^^^^^^^^
439-
440-
Running at ALCF
441-
^^^^^^^^^^^^^^^
442-
443-

0 commit comments

Comments
 (0)