Skip to content

Commit b9131ef

Browse files
authored
Document final retry behaviour. (#571)
Document final retry behaviour
1 parent 8d63907 commit b9131ef

File tree

4 files changed

+54
-9
lines changed

4 files changed

+54
-9
lines changed

src/tutorial/furthertopics/retries.rst

Lines changed: 19 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -8,14 +8,25 @@ failure in submission or execution.
88
Purpose
99
-------
1010

11-
Retries can be useful for tasks that may occasionally fail due to external
12-
events, and are routinely fixable when they do - an example would be a task
13-
that is dependent on a system that experiences temporary outages.
14-
15-
If a task fails, the Cylc retry mechanism can resubmit it after a
16-
pre-determined delay. An environment variable, ``$CYLC_TASK_TRY_NUMBER``
17-
is incremented and passed into the task - this means you can write your
18-
task script so that it changes behaviour accordingly.
11+
Retries can be useful for tasks that occasionally fail for known, fixable
12+
reasons. Cylc can rerun a failing job multiple times, with user-defined delays
13+
between tries.
14+
15+
Tasks that fail because of temporary hardware or network outages may succeed if
16+
simply resubmitted after a delay. Others might succeed if configured differently
17+
on the retry.
18+
19+
A job environment variable ``$CYLC_TASK_TRY_NUMBER`` increments with each try,
20+
to allow try-dependent behaviour in the task script.
21+
22+
.. note::
23+
24+
Tasks only enter the ``submit-failed`` state if job submission fails with no
25+
retries left. Otherwise they return to the waiting state, to wait on the
26+
next try.
27+
28+
Tasks only enter the ``failed`` state if job execution fails with no retries
29+
left. Otherwise they return to the waiting state, to wait on the next try.
1930

2031

2132
Example

src/tutorial/runtime/runtime-configuration.rst

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -160,7 +160,7 @@ Jobs can fail for several reasons:
160160
For example, setting ``execution retry delays = PT10M``
161161
will cause the job to retry every 10 minutes on execution failure.
162162

163-
Use a multiplier to limit retries to a specific number:
163+
Use a multiplier to limit the number of retries:
164164

165165
.. code-block:: cylc
166166
@@ -176,6 +176,16 @@ Jobs can fail for several reasons:
176176
# then every 30 mins thereafter.
177177
submission retry delays = 2*PT10M, PT30M
178178
179+
.. note::
180+
181+
Tasks only enter the ``submit-failed`` state if job submission fails with no
182+
retries left. Otherwise they return to the waiting state, to wait on the
183+
next try.
184+
185+
Tasks only enter the ``failed`` state if job execution fails with no retries
186+
left. Otherwise they return to the waiting state, to wait on the next try.
187+
188+
179189

180190
.. _tutorial.start_stop_restart:
181191

src/user-guide/running-workflows/retrying-tasks.rst

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,18 @@ state, with a new clock trigger to handle the configured retry delay.
1111
A task that is waiting on a retry will already have one or more failed jobs
1212
associated with it.
1313

14+
15+
.. note::
16+
17+
Tasks only enter the ``submit-failed`` state if job submission fails with no
18+
retries left. Otherwise they return to the waiting state, to wait on the
19+
next try.
20+
21+
Tasks only enter the ``failed`` state if job execution fails with no retries
22+
left. Otherwise they return to the waiting state, to wait on the next try.
23+
24+
25+
1426
Aborting a Retry Sequence
1527
-------------------------
1628

src/user-guide/writing-workflows/runtime.rst

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -504,6 +504,18 @@ Tasks can have a list of :term:`ISO8601 durations <ISO8601 duration>` as retry
504504
intervals. If the job fails the task will return to the ``waiting`` state
505505
with a clock-trigger configured with the next retry delay.
506506

507+
508+
.. note::
509+
510+
Tasks only enter the ``submit-failed`` state if job submission fails with no
511+
retries left. Otherwise they return to the waiting state, to wait on the
512+
next try.
513+
514+
Tasks only enter the ``failed`` state if job execution fails with no retries
515+
left. Otherwise they return to the waiting state, to wait on the next try.
516+
517+
518+
507519
In the following example, tasks ``bad`` and ``flaky`` each have 3 retries
508520
configured, with a 10 second delay between. On the final try, ``bad`` fails
509521
again and goes to the ``failed`` state, while ``flaky`` succeeds and triggers

0 commit comments

Comments
 (0)