Skip to content

Commit 04323f9

Browse files
committed
Resiliency Policy Error Code Retries
Signed-off-by: Anton Troshin <[email protected]>
1 parent 8257391 commit 04323f9

File tree

2 files changed

+147
-2
lines changed

2 files changed

+147
-2
lines changed
Lines changed: 145 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,145 @@
1+
# Resiliency Policy Error Code Retries
2+
3+
* Author(s): Anton Troshin (@antontroshin), Taction (@taction)
4+
* Updated: 2024-09-18
5+
6+
## Overview
7+
8+
This is a design proposal to provide additional functionality for Dapr Resiliency Policy Retries to be able to enforce policy only on specific response error codes.
9+
It only focuses on the `retries` (https://docs.dapr.io/operations/resiliency/policies/#retries) part of the policy.
10+
11+
## Background
12+
13+
In some applications, error codes may be used to indicate the business error, and retrying the operation might not be necessary or otherwise desirable.
14+
Customizing retry behavior will allow a more granular way to handle error codes that suit each use case.
15+
Currently, all errors are retried when the policy is applied.
16+
Some errors are not retryable, and subsequent calls will result in the same error, avoiding these retry calls will reduce the overall amount of requests, traffic, and errors.
17+
18+
## Related Items
19+
20+
https://github.com/dapr/dapr/issues/6683
21+
https://github.com/dapr/dapr/issues/6428
22+
https://github.com/dapr/dapr/issues/7697
23+
24+
PR:
25+
https://github.com/dapr/dapr/pull/7132
26+
27+
Docs:
28+
https://github.com/dapr/docs/issues/4254
29+
https://github.com/dapr/docs/issues/3859
30+
31+
## Expectations and alternatives
32+
33+
* What is in scope for this proposal?
34+
- HTTP and gRPC Service Invocation, direct and proxied
35+
- Bindings
36+
- Pub/Sub
37+
38+
## Implementation Details
39+
40+
### Design
41+
42+
Add a new object field to the `retries` policy Spec to allow the user to specify the error codes that should be retried.
43+
Separate fields for HTTP and gRPC. The new fields should be optional and will default to the existing behavior, which is to retry on all errors.
44+
45+
### Example 1:
46+
In this example, the retry policy will retry **_only_** on HTTP 500 and HTTP error range 502-504 (inclusive) and gRPC error range 2-4 (inclusive).
47+
The rest of the errors will not be retried.
48+
49+
```yaml
50+
apiVersion: dapr.io/v1alpha1
51+
kind: Resiliency
52+
metadata:
53+
name: myresiliency
54+
scopes:
55+
- app1
56+
spec:
57+
policies:
58+
retries:
59+
pubsubRetry:
60+
policy: constant
61+
duration: 5s
62+
maxRetries: 10
63+
matching:
64+
httpStatusCodes: "500,502-504"
65+
gRPCStatusCodes: "2-4"
66+
```
67+
68+
### Example 2:
69+
In this example, the retry policy will retry **_only_** on gRPC error range 1-15 (inclusive).
70+
However, this policy will not apply to the HTTP errors, and they will be retried according to the default behavior, which is to retry on all errors.
71+
72+
```yaml
73+
apiVersion: dapr.io/v1alpha1
74+
kind: Resiliency
75+
metadata:
76+
name: myresiliency
77+
scopes:
78+
- app1
79+
spec:
80+
policies:
81+
retries:
82+
pubsubRetry:
83+
policy: constant
84+
duration: 5s
85+
maxRetries: 10
86+
matching:
87+
gRPCStatusCodes: "1-15"
88+
```
89+
90+
### Acceptable Values
91+
The acceptable values are the same as the ones defined in the [HTTP Status Codes](https://developer.mozilla.org/en-US/docs/Web/HTTP/Status) and [gRPC Status Codes](https://grpc.io/docs/guides/status-codes/) documentation.
92+
93+
- HTTP: from 100 to 599
94+
- gRPC: from 1 to 16
95+
96+
### Setting Format
97+
Both the `httpStatusCodes` and `gRPCStatusCodes` fields are of type string and optional and can be set to a comma-separated list of error codes and/or ranges of error codes.
98+
The range must be in the format `<start>-<end>` (inclusive). Having more than one dash in the range is not allowed.
99+
100+
### Parsing the configuration
101+
102+
The configuration values will be first parsed as comma-separated lists.
103+
Each entry in the list will be then parsed as a single error code or a range of error codes.
104+
For invalid entries, the error will be logged when the policy is first loaded and the entry will be ignored, this will not fail the entire policy or the application start.
105+
106+
Example:
107+
108+
```yaml
109+
apiVersion: dapr.io/v1alpha1
110+
kind: Resiliency
111+
metadata:
112+
name: myresiliency
113+
scopes:
114+
- app1
115+
spec:
116+
policies:
117+
retries:
118+
pubsubRetry:
119+
policy: constant
120+
duration: 5s
121+
maxRetries: 10
122+
matching:
123+
httpStatusCodes: "500,502-504,15,404-405-500,-1,0,"
124+
```
125+
The steps to parse the configuration are:
126+
1. Split the `httpStatusCodes` configuration string `"500,502-504,15,404-405-500,-1,0,"` by the comma character resulting in the following list: `["500", "502-504", "15", "404-405-500", "-1", "0"]` ignoring the empty strings.
127+
2. For each entry in the list, parse it as a single error code or a range of error codes.
128+
3. If the entry is a single error code, add it to the list of error codes to retry.
129+
4. If the entry is a range of error codes (each field for the relevant HTTP or gRPC error codes), add all the error codes in the range to the list of error codes to retry.
130+
- 500 is **valid** code for HTTP
131+
- 502-504 **valid** range of codes for HTTP
132+
- 15 is **invalid** code for HTTP, error logged and entry ignored
133+
- 404-405-500 is **invalid** range contains more than one dash, error logged and entry ignored
134+
- -1 is ignored is **invalid** code for HTTP, error logged and entry ignored
135+
- 0 is ignored is **invalid** code for HTTP, error logged and entry ignored
136+
137+
### Acceptance Criteria
138+
139+
Integration and unit tests will be added to verify the new functionality.
140+
141+
## Completion Checklist
142+
143+
* Code changes
144+
* Tests added (e2e, unit)
145+
* Documentation

templates/proposal.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ A brief description of the proposal; include information such as:
1313

1414
## Background
1515

16-
This section is intented to provide the community with the reasoning behind this proposal -- why is this proposal being made? What problem is it solving for users / developers / operators and how does it solve that for them?
16+
This section is intended to provide the community with the reasoning behind this proposal -- why is this proposal being made? What problem is it solving for users / developers / operators and how does it solve that for them?
1717

1818
## Related Items
1919

@@ -56,7 +56,7 @@ How will this work, technically? Where applicable, include:
5656
How will success be measured?
5757

5858
* Performance targets
59-
* Compabitility requirements
59+
* Compatibility requirements
6060
* Metrics
6161

6262
## Completion Checklist

0 commit comments

Comments
 (0)