Skip to content

Commit 9304c08

Browse files
Consolidate task polling documentation and correct the info on regular polling (#556)
Co-authored-by: dpmatthews <[email protected]>
1 parent eb7691b commit 9304c08

File tree

2 files changed

+27
-75
lines changed

2 files changed

+27
-75
lines changed

src/user-guide/running-workflows/tracking-task-state.rst

Lines changed: 0 additions & 50 deletions
Original file line numberDiff line numberDiff line change
@@ -1,40 +1,3 @@
1-
2-
.. _Task Job Polling:
3-
4-
Job Polling
5-
-----------
6-
7-
At any point after job submission, jobs can be *polled* to check that
8-
their true state matches what scheduler expects based on received job status
9-
messages or previous polls.
10-
11-
Polling may be necessary if, for example, a job gets killed by the
12-
untrappable SIGKILL signal (e.g. ``kill -9 PID``), or after a network
13-
outage that prevents job status messages getting back to the scheduler, or if
14-
the :term:`scheduler` itself was down when active jobs finished.
15-
16-
To poll a job the :term:`scheduler` interrogates the :term:`job runner`, and
17-
the ``job.status`` file of the task, on the job host. This information is
18-
enough to determine correct task status even if the job finished while the
19-
:term:`scheduler` was down or unreachable on the network.
20-
21-
.. seealso::
22-
- ``cylc poll --help``
23-
24-
25-
Routine Polling
26-
^^^^^^^^^^^^^^^
27-
28-
Jobs are automatically polled at certain times: once on job submission
29-
timeout; several times on exceeding the job execution time limit; and at
30-
workflow restart any tasks recorded as active are polled to find out what
31-
happened to them while the workflow was down.
32-
33-
Routine polling can also be configured as a way to track job status on platforms
34-
that do not allow routing back to the workflow host for task messaging by TCP
35-
or SSH. See :ref:`Polling To Track Job Status`.
36-
37-
381
.. _TaskComms:
392

403
Tracking Job Status
@@ -157,16 +120,3 @@ the last value, which is used repeatedly until the job is finished:
157120
[runtime]
158121
[[task]]
159122
platform = my_platform
160-
161-
162-
A list of intervals with optional multipliers can be used for both submission
163-
and execution polling, although a single value is probably sufficient for
164-
submission. If these items are not configured, default values from
165-
site and user global config will be used for
166-
:cylc:conf:
167-
`global.cylc[platforms][<platform name>]communication method = poll`.
168-
169-
Polling is not done by default under the other task communications methods, but
170-
it can be configured as well if you like.
171-
172-

src/user-guide/task-implementation/job-submission.rst

Lines changed: 27 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -174,19 +174,27 @@ The template's ``%(job)s`` will be substituted by the job file path.
174174
Job Polling
175175
-----------
176176

177-
For supported :term:`job runners <job runner>`, one-way polling can be used to determine actual
178-
job status: the :term:`scheduler` executes a process on the task host, by
179-
non-interactive ssh, to interrogate the batch queueing system there, and to
180-
read a *status file* that is automatically generated by the :term:`job script`
181-
as it runs.
177+
For supported :term:`job runners <job runner>`, jobs can be *polled* to
178+
check that their true state matches what the scheduler expects based on received
179+
job status messages or previous polls. The :term:`scheduler` executes a process
180+
on the task host, by non-interactive ssh, to interrogate the job runner, and to
181+
read the ``job.status`` file of the task which is automatically generated by the
182+
:term:`job script` as it runs.
182183

183184
Polling may be required to update the workflow state correctly after unusual
184-
events such as a machine being rebooted with tasks running on it, or network
185-
problems that prevent task messages from getting back to the workflow host.
185+
events such as
186+
187+
- a job gets killed by the untrappable SIGKILL signal (e.g. ``kill -9 PID``)
188+
- a machine being rebooted with tasks running on it
189+
- network problems prevent task messages from getting back to the workflow host
190+
- the :term:`scheduler` itself was down when active jobs finished
186191

187192
Tasks can be polled on demand by using the
188193
``cylc poll`` command.
189194

195+
.. seealso::
196+
- ``cylc poll --help``
197+
190198
Tasks are polled automatically, once, if they timeout while queueing in a
191199
job runner and submission timeout is set.
192200
(See :cylc:conf:`[runtime][<namespace>][events]`
@@ -196,34 +204,28 @@ Tasks are polled multiple times, where necessary, when they exceed their
196204
execution time limits. These are normally set with some initial delays to allow
197205
the job runners to kill the jobs.
198206
(See
199-
:cylc:conf:`execution time limit intervals <global.cylc[platforms][<platform name>]execution time limit polling intervals>`
207+
:cylc:conf:`execution time limit polling intervals <global.cylc[platforms][<platform name>]execution time limit polling intervals>`
200208
for how to configure the polling intervals).
201209

202210
Any tasks recorded in the *submitted* or *running* states at workflow
203211
restart are automatically polled to determine what happened to them while the
204212
workflow was down.
205213

206-
Regular polling can also be configured as a health check on tasks submitted to
207-
hosts that are known to be flaky, or as the sole method of determining task
208-
status on hosts that do not allow task messages to be routed back to the workflow
209-
host.
210-
211-
To use polling instead of task-to-workflow messaging set
212-
:cylc:conf:`global.cylc[platforms][<platform name>]communication method = poll`.
213-
214-
The default polling intervals can be overridden in the global configuration:
214+
By default, regular polling also takes place every 15 minutes whilst a job is
215+
submitted or running. The default polling intervals can be overridden in the
216+
global configuration:
215217

216-
* :cylc:conf:`submission polling intervals<global.cylc[platforms][<platform name>]submission polling intervals>`
217-
* :cylc:conf:`execution polling intervals<global.cylc[platforms][<platform name>]execution polling intervals>`
218+
* :cylc:conf:`global.cylc[platforms][<platform name>]submission polling intervals`
219+
* :cylc:conf:`global.cylc[platforms][<platform name>]execution polling intervals`
218220

219-
Or in workflow configurations (in which case polling will be done regardless
220-
of the communication method configured for the platform):
221+
The polling intervals can also be configured for individual tasks:
221222

222-
* :cylc:conf:`submission polling intervals<[runtime][<namespace>]submission polling intervals>`
223-
* :cylc:conf:`execution polling intervals<[runtime][<namespace>]execution polling intervals>`
223+
* :cylc:conf:`[runtime][<namespace>]submission polling intervals`
224+
* :cylc:conf:`[runtime][<namespace>]execution polling intervals`
224225

225-
Note that regular polling is not as efficient as task messaging in updating
226-
task status, and it should be used sparingly in large workflows.
226+
Polling can be used as the sole method of determining task status on hosts that
227+
do not allow task messages to be routed back to the workflow host.
228+
See :ref:`Polling To Track Job Status`.
227229

228230
.. note::
229231

0 commit comments

Comments
 (0)