@@ -43,11 +43,13 @@ resource manager job. One :class:`Job <psij.job.Job>` instance might represent
4343a Slurm job running on a LLNL cluster, another a Cobalt job running on ALCF's
4444Theta, another a Flux job in the cloud, and so on.
4545
46- A ``Job `` is described by an executable plus job attributes which specify how
47- exactly the job is executed. Static job attributes such es resource requiremens
48- are defined by the :class: `JobSpec <psij.job_spec.JobSpec> ` at creation, dynamic
49- job attributes such as the :class: `JobState <psij.job_state.JobState> ` are
50- updated by PSI/J at runtime.
46+ A :class: `Job <psij.job.Job> ` is described by an executable plus job attributes which specify how
47+ exactly the job is executed. Static job attributes such as resource
48+ requirements are defined by the :class: `JobSpec <psij.job_spec.JobSpec> ` at
49+ creation. Dynamic job attributes such as the :class: `JobState
50+ <psij.job_state.JobState> ` are modified by :class: `JobExecutors
51+ <psij.job_executor.JobExecutor> ` as the :class: `Job <psij.job.Job> `
52+ progresses through its lifecycle.
5153
5254
5355What is a JobExecutor?
@@ -75,7 +77,7 @@ PSI/J currently provides executors for the following backends:
7577 - `slurm ` : `Slurm Scheduling System <https://slurm.schedmd.com/ >`_
7678 - `lsf ` : `IBM Spectrum LSF <https://www.ibm.com/docs/en/spectrum-lsf >`_
7779 - `pbspro `: `Altair's PBS-Professional <https://www.altair.com/pbs-professional >`_
78- - `cobalt `: ALCF's Cobalt job scheduler
80+ - `cobalt `: ` ALCF's Cobalt job scheduler < https://www.alcf.anl.gov/support/user-guides/theta/queueing-and-running-jobs/job-and-queue-scheduling/index.html >`_
7981
8082We encourage the contribution of executors for additional backends - please
8183reference the `developers documentation
@@ -143,12 +145,12 @@ numbers of jobs (tested with up to 64k jobs).
143145Configuring your Job
144146--------------------
145147
146- In the example above, the `executable='/bin/date' ` part tells PSI/J that we want
147- the job to run the `/bin/date ` command. But there are other parts to the job
148+ In the example above, the `` executable='/bin/date' ` ` part tells PSI/J that we want
149+ the job to run the `` /bin/date ` ` command. But there are other parts to the job
148150which can be configured:
149151
150152- arguments for the job executable
151- - environment the job is running it
153+ - environment the job is running in
152154- destination for standard output and error streams
153155- resource requirements for the job's execution
154156- accounting details to be used
@@ -161,8 +163,8 @@ Job Arguments
161163^^^^^^^^^^^^^
162164
163165The executable's command line arguments to be used for a job are specified as
164- a list of strings in the arguments attribute of the `JobSpec ` class. For
165- example, our previous `/bin/date ` job could be changed to request UTC time
166+ a list of strings in the arguments attribute of the `` JobSpec ` ` class. For
167+ example, our previous `` /bin/date ` ` job could be changed to request UTC time
166168formatting:
167169
168170.. rst-class :: executor-type-selector selector-mode-tabs
@@ -174,7 +176,7 @@ Slurm // Local // LSF // PBS // Cobalt
174176 from psij import Job, JobExecutor, JobSpec
175177
176178 ex = JobExecutor.get_instance(' <&executor-type>' )
177- job = Job(JobSpec(executable = ' /bin/date' , arguments = [' -u ' ]))
179+ job = Job(JobSpec(executable = ' /bin/date' , arguments = [' -utc ' , ' --debug ' ]))
178180 ex.submit(job)
179181
180182 Note: `JobSpec ` attributes can also be added incrementally:
@@ -191,10 +193,10 @@ Note: `JobSpec` attributes can also be added incrementally:
191193 Job Environment
192194^^^^^^^^^^^^^^^
193195
194- The job environment is provided a environment variables to the executing job
195- - the are the equivalent of ` export FOO=bar ` on the shell command line. Those
196- environment variables are specified as a dictionary of string-type key/value
197- pairs:
196+ The job environment sets the environment variables for a job before it is
197+ launched. This is the equivalent of exporting `` FOO=bar `` on the command line
198+ before running a command. These environment variables are specified as
199+ a dictionary of string key/value pairs:
198200
199201.. code-block :: python
200202
@@ -209,11 +211,11 @@ initialization files (`e.g., ~/.bashrc`), including from any modules loaded in
209211the default shell environment.
210212
211213
212- Job StdIO
214+ Job Stdio
213215^^^^^^^^^
214216
215217Standard output and standard error streams of the job can be individually
216- redirected to files by setting the `stdout_path ` and `stderr_path ` attributes:
218+ redirected to files by setting the `` stdout_path `` and `` stderr_path ` ` attributes:
217219
218220.. code-block :: python
219221
@@ -225,29 +227,30 @@ redirected to files by setting the `stdout_path` and `stderr_path` attributes:
225227 spec.stderr_path = ' /tmp/date.err'
226228
227229 The job's standard input stream can also be redirected to read from a file, by
228- setting the `spec.stdin_path ` attribute.
230+ setting the `` spec.stdin_path ` ` attribute.
229231
230232
231233Job Resources
232234^^^^^^^^^^^^^
233235
234236A job submitted to a cluster is allocated a specific set of resources to run on.
235237The amount and type of resources are defined by a resource specification
236- `psij.ResourceSpec ` which becomes a part of the job specification. The resource specification supports the following attributes:
238+ ``ResourceSpec `` which becomes a part of the job specification. The resource
239+ specification supports the following attributes:
237240
238- - `node_count `: allocate that number of compute nodes to the job. All
241+ - `` node_count ` `: allocate that number of compute nodes to the job. All
239242 cpu-cores and gpu-cores on the allocated node can be exclusively used by the
240243 submitted job.
241- - `processes_per_node `: on the allocated nodes, execute that given number of
244+ - `` processes_per_node ` `: on the allocated nodes, execute that given number of
242245 processes.
243- - `process_count `: the total number of processes (ranks) to be started
244- - `cpu_cores_per_process `: the number of cpu cores allocated to each launched
246+ - `` process_count ` `: the total number of processes (ranks) to be started
247+ - `` cpu_cores_per_process ` `: the number of cpu cores allocated to each launched
245248 process. PSI/J uses the system definition of a cpu core which may refer to
246- a physical cpu core or to a virtual cpu core, aka. hardware thread.
247- - `gpu_cores_per_process `: the number of gpu cores allocated to each launched
249+ a physical cpu core or to a virtual cpu core, also known as a hardware thread.
250+ - `` gpu_cores_per_process ` `: the number of gpu cores allocated to each launched
248251 process. The system definition of an gpu core is used, but usually refers
249252 to a full physical GPU.
250- - `exclusive_node_use `: When this boolean flag is set to `True `, then PSI/J
253+ - `` exclusive_node_use `` : When this boolean flag is set to `` True ` `, then PSI/J
251254 will ensure that no other jobs, neither of the same user nor of other users
252255 of the same system, will run on any of the compute nodes on which processes
253256 for this job are launched.
@@ -258,7 +261,7 @@ launched on a single cpu core.
258261
259262The user should also take care not to define contradictory statements. For
260263example, the following specification cannot be enacted by PSI/J as the specified
261- node count contradicts the value of `process_count / processes_per_node `:
264+ node count contradicts the value of `` process_count / processes_per_node ` `:
262265
263266.. code-block :: python
264267
@@ -268,6 +271,7 @@ node count contradicts the value of `process_count / processes_per_node`:
268271 spec.executable = ' /bin/stress'
269272 spec.resource_spec = ResourceSpec(node_count = 2 , processes_per_node = 2 ,
270273 process_count = 2 )
274+ # the line above should raise an 'psij.InvalidJobException' exception
271275
272276
273277 Processes versus ranks
@@ -295,7 +299,7 @@ Scheduling Information
295299To specify resource-manager-specific information, like queues/partitions,
296300runtime, and so on, create a :class: `JobAttributes
297301<psij.job_attributes.JobAttributes> ` and set it with ``JobSpec(...,
298- attributes=my_job_attributes) ``::
302+ attributes=my_job_attributes) ``:
299303
300304.. rst-class :: executor-type-selector selector-mode-tabs
301305
@@ -414,30 +418,3 @@ Slurm // Local // LSF // PBS // Cobalt
414418
415419 Status callbacks can also be set on individual jobs with
416420:meth: `set_job_status_callback <psij.job.Job.set_job_status_callback> `.
417-
418-
419- Running Psi/J at your site
420- --------------------------
421-
422- TODO: Pages should contain:
423-
424- - A simple example ported to multiple sites showing how to configure PSI/J for
425- each site with required configuration / attributes (with site-switcher?)
426- (Each example should be in the test suite)
427- - Common errors you might encounter
428- - ‘If your site isn’t listed, please contact us to include it’
429-
430-
431- Running at LLNL LC
432- ^^^^^^^^^^^^^^^^^^
433-
434- Running at OLCF
435- ^^^^^^^^^^^^^^^
436-
437- Running at NERSC
438- ^^^^^^^^^^^^^^^^
439-
440- Running at ALCF
441- ^^^^^^^^^^^^^^^
442-
443-
0 commit comments