Skip to content

Commit ccb4dee

Browse files
authored
Add mechanism to control non-determinism in retries (#159)
* Add mechanism to control non-determinism in retries Signed-off-by: Lucas Caparelli <[email protected]> * Make suggested changes Signed-off-by: Lucas Caparelli <[email protected]> * Add Lucas Caparelli as contributor Signed-off-by: Lucas Caparelli <[email protected]> * Add relative jitter support Signed-off-by: Lucas Caparelli <[email protected]>
1 parent 908bd23 commit ccb4dee

File tree

3 files changed

+30
-2
lines changed

3 files changed

+30
-2
lines changed

community/contributors.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,7 @@ If you are participating insome way, please add your information via pull reques
3333
* Ruben Romero Montes
3434
* Tihomir Surdilovic
3535
* Ricardo Zanini
36+
* Lucas Caparelli
3637

3738
* **Camunda**
3839
* Mauricio Salatino

schema/workflow.json

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -339,6 +339,12 @@
339339
"minimum": 0,
340340
"minLength": 0,
341341
"description": "Maximum number of retry attempts. Value of 0 means no retries are performed"
342+
},
343+
"jitter": {
344+
"type": ["number","string"],
345+
"minimum": 0.0,
346+
"maximum": 1.0,
347+
"description": "If float type, maximum amount of random time added or subtracted from the delay between each retry relative to total delay (between 0.0 and 1.0). If string type, absolute maximum amount of random time added or subtracted from the delay between each retry (ISO 8601 duration format)"
342348
}
343349
},
344350
"required": [

specification.md

Lines changed: 23 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1332,6 +1332,7 @@ transition:
13321332
| interval | Interval value for retry (ISO 8601 repeatable format). For example: "R5/PT15M" (Starting from now repeat 5 times with 15 minute intervals)| string | no |
13331333
| multiplier | Multiplier value by which interval increases during each attempt (ISO 8601 time format). For example: "PT3S" meaning the second attempt interval is increased by 3 seconds, the third interval by 6 seconds and so on | string | no |
13341334
| maxAttempts | Maximum number of retry attempts. Value of 0 means no retries are performed | string or integer | no |
1335+
| jitter | If float type, maximum amount of random time added or subtracted from the delay between each retry relative to total delay (between 0.0 and 1.0). If string type, absolute maximum amount of random time added or subtracted from the delay between each retry (ISO 8601 duration format) | float or string | no |
13351336

13361337
<details><summary><strong>Click to view example definition</strong></summary>
13371338
<p>
@@ -1348,7 +1349,8 @@ transition:
13481349
{
13491350
"expression": "{{ $.errors[?(@.name == 'FunctionError')] }}",
13501351
"interval": "PT2M",
1351-
"maxAttempts": 3
1352+
"maxAttempts": 3,
1353+
"jitter": "PT0.001S"
13521354
}
13531355
```
13541356

@@ -1359,6 +1361,7 @@ transition:
13591361
expression: "{{ $.errors[?(@.name == 'FunctionError')] }}"
13601362
interval: PT2M
13611363
maxAttempts: 3
1364+
jitter: PT0.001S
13621365
```
13631366

13641367
</td>
@@ -1388,10 +1391,28 @@ To explain this better, let's say we have:
13881391

13891392
which means that we will retry 4 times after waiting 1, 3 (1 + 2), 5 (1 + 2 + 2), and 7 (1 + 2 + 2 + 2) minutes.
13901393

1391-
The maxAttempts property determines the maximum number of retry attempts allowed. If this property is set to 0 no retries are performed.
1394+
The `maxAttempts` property determines the maximum number of retry attempts allowed. If this property is set to 0 no retries are performed.
13921395

13931396
For more information, refer to the [Workflow Error Handling - Retrying](#workflow-retrying) section.
13941397

1398+
The `jitter` property is important to prevent certain scenarios where clients
1399+
are retrying in sync, possibly causing or contributing to a transient failure
1400+
precisely because they're retrying at the same time. Adding a typically small,
1401+
bounded random amount of time to the period between retries serves the purpose
1402+
of attempting to prevent these retries from happening simultaneously, possibly
1403+
reducing total time to complete requests and overall congestion. How this value
1404+
is used in the exponential backoff algorithm is left up to implementations.
1405+
1406+
`jitter` may be specified as a percentage relative to the total delay.
1407+
For example, if `interval` is 2 seconds, `multiplier` is 2 seconds and we're at
1408+
the third attempt, there will be a delay of 6 seconds. If we set `jitter` to
1409+
0.3, then a random amount of time between 0 and 1.8 (`totalDelay * jitter == 6 * 0.3`)
1410+
will be added or subtracted from the delay.
1411+
1412+
Alternatively, `jitter` may be defined as an absolute value speficied as an ISO
1413+
8601 duration. This way, the maximum amount of random time added is fixed and
1414+
will not increase as new attempts are made.
1415+
13951416
#### Transition Definition
13961417

13971418
| Parameter | Description | Type | Required |

0 commit comments

Comments
 (0)