Skip to content

Commit 75eba08

Browse files
Grabauskasheuveajohannesbattjes
authored
Add Notifications Architecture documentation (#89)
Co-authored-by: heuvea <66989902+heuvea@users.noreply.github.com> Co-authored-by: Johannes Battjes <johannes.battjes@roxit.nl>
1 parent 95e6a57 commit 75eba08

File tree

3 files changed

+155
-6
lines changed

3 files changed

+155
-6
lines changed

docs/gebruik-van-subscriptions-in-autorisaties.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,7 @@ Filters allow your application to limit the notifications it receives to only th
3939
| Filter Key | Description | Allowed/Example Values |
4040
| ----------------------------- | --------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------- |
4141
| `#resource` | The resources as listed in the ZGW Open Api Specification, and presented in the resource field in the notification message itself | zaak, status, zaakobject, zaakinformatieobject, zaakeigenschap, rol, resultaat, zaakbesluit |
42-
| `#action` | The actions as listed in the ZGW Open Api Specification, and presented in the resource field in the notification message itself | create, update, destroy |
42+
| `#actie` | The actions as listed in the ZGW Open Api Specification, and presented in the resource field in the notification message itself | create, update, destroy |
4343
| `bronorganisatie` | The rsin of the organization that initiated or owns the case in the field bronorganisatie | 000001375 |
4444
| `zaaktype` | URL's of the casetype versions of the case in the field zaaktype | https://ztc.zgw.nl/api/v1/zaaktypen/b1fac1a1-7117-1e50-1d01-d155a715f1ed |
4545
| `vertrouwelijkheidaanduiding` | The confidentiality indication of the case in field vertrouwelijkheidaanduiding | openbaar, beperkt_openbaar, intern, zaakvertrouwelijk, vertrouwelijk, confidentieel, geheim, zeer_geheim |
@@ -56,7 +56,7 @@ Filters allow your application to limit the notifications it receives to only th
5656
| Filter Key | Description | Allowed/Example Values |
5757
| ------------------------------- | --------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------- |
5858
| `#resource` | the resources as listed in the ZGW Open Api Specification, and presented in the resource field in the notification message itself | besluit, besluitinformatieobject |
59-
| `#action` | The actions as listed in the ZGW Open Api Specification, and presented in the resource field in the notification message itself | create, update, destroy |
59+
| `#actie` | The actions as listed in the ZGW Open Api Specification, and presented in the resource field in the notification message itself | create, update, destroy |
6060
| `besluittype` | URL of the decision type version in the field besluittype | https://ztc.zgw.nl/api/v1/besluittypen/ce571dae-f0c1-a1e5-115a-f1acc1d171e5 |
6161
| `verantwoordelijke_organisatie` | Rsin of the organisation responsible for the decision in field verantwoordelijkeOrganisatie | 010110100 |
6262
| `besluittype_omschrijving` | The decision type in field omschrijving of besluittype | Beschikken op aanvraag |
@@ -68,7 +68,7 @@ Filters allow your application to limit the notifications it receives to only th
6868
| Filter Key | Description | Allowed/Example Values |
6969
| ----------------------------------- | --------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------- |
7070
| `#resource` | the resources as listed in the ZGW Open Api Specification, and presented in the resource field in the notification message itself | enkelvoudiginformatieobject, gebruiksrechten, verzending |
71-
| `#action` | The actions as listed in the ZGW Open Api Specification, and presented in the resource field in the notification message itself | create, update, destroy |
71+
| `#actie` | The actions as listed in the ZGW Open Api Specification, and presented in the resource field in the notification message itself | create, update, destroy |
7272
| `bronorganisatie` | The rsin of the organization that initiated or received and owns the document in the field bronorganisatie | 813264571 |
7373
| `informatieobjecttype` | URL of the information object type version in field informatieobjecttype | https://ztc.zgw.nl/api/v1/informatieobjecttypen/cadd1ce5-5a1a-7070-dabb-c1a551f1ab1e |
7474
| `vertrouwelijkheidaanduiding` | Confidentiality indication in field vertrouwelijkheidaanduiding | openbaar, beperkt_openbaar, intern, zaakvertrouwelijk, vertrouwelijk, confidentieel, geheim, zeer_geheim |
@@ -83,7 +83,7 @@ Filters allow your application to limit the notifications it receives to only th
8383
| Filter Key | Description | Allowed/Example Values |
8484
| ----------- | --------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------- |
8585
| `#resource` | the resources as listed in the ZGW Open Api Specification, and presented in the resource field in the notification message itself | zaaktype |
86-
| `#action` | The actions as listed in the ZGW Open Api Specification, and presented in the resource field in the notification message itself | create, update, destroy |
86+
| `#actie` | The actions as listed in the ZGW Open Api Specification, and presented in the resource field in the notification message itself | create, update, destroy |
8787
| `catalogus` | The URL of the catalog in field catalogus | https://ztc.zgw.nl/api/v1/catalogussen/fe0ff0r5-fdd1-5011-1177-d15ac1d1f1ed |
8888
| `domein` | The domain of the catalog in field domein of catalogus | VTH |
8989

@@ -92,7 +92,7 @@ Filters allow your application to limit the notifications it receives to only th
9292
| Filter Key | Description | Allowed/Example Values |
9393
| ----------- | --------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------- |
9494
| `#resource` | the resources as listed in the ZGW Open Api Specification, and presented in the resource field in the notification message itself | informatieobjecttype |
95-
| `#action` | The actions as listed in the ZGW Open Api Specification, and presented in the resource field in the notification message itself | create, update, destroy |
95+
| `#actie` | The actions as listed in the ZGW Open Api Specification, and presented in the resource field in the notification message itself | create, update, destroy |
9696
| `catalogus` | The URL of the catalog in field catalogus | https://ztc.zgw.nl/api/v1/catalogussen/fe0ff0r5-fdd1-5011-1177-d15ac1d1f1ed |
9797
| `domein` | The domain of the catalog in field domein of catalogus | VTH |
9898

@@ -101,7 +101,7 @@ Filters allow your application to limit the notifications it receives to only th
101101
| Filter Key | Description | Allowed/Example Values |
102102
| ----------- | --------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------- |
103103
| `#resource` | the resources as listed in the ZGW Open Api Specification, and presented in the resource field in the notification message itself | besluittype |
104-
| `#action` | The actions as listed in the ZGW Open Api Specification, and presented in the resource field in the notification message itself | create, update, destroy |
104+
| `#actie` | The actions as listed in the ZGW Open Api Specification, and presented in the resource field in the notification message itself | create, update, destroy |
105105
| `catalogus` | The URL of the catalog in field catalogus | https://.../catalogi/fe0ff0r5-fdd1-5011-1177-d15ac1d1f1ed |
106106
| `domein` | The domain of the catalog in field domein of catalogus | VTH |
107107

docs/notifications-architecture.md

Lines changed: 144 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,144 @@
1+
---
2+
title: "How does the retry system for notitifations work?"
3+
description: "Overview of the Notifications system architecture, detailing retry strategies with Polly and Hangfire, circuit breaker patterns, and HTTP status code handling."
4+
keywords: [notifications, architecture, polly, hangfire, circuit breaker, retry, webhook, OneGround, ZGW]
5+
---
6+
7+
# How does the retry system for notifications work?
8+
9+
When a ZGW entity is created, modified, or deleted, a notification is sent. Client applications can subscribe to these notifications. This requires a webhook receiver on the client side to which the notifications can be delivered. The URL (and authentication) of this webhook receiver is stored in the client's subscription in our Notifications database.
10+
11+
There are four key components to the Notifications system:
12+
13+
1. Polly retries
14+
2. Hangfire retries and priority queue
15+
3. Circuit breaker
16+
4. HTTP status codes that lead to retries
17+
18+
## Polly retries
19+
20+
If a notification cannot be delivered to a client webhook receiver (for example, because the service is temporarily down), a new attempt is made to deliver the notification within a few seconds. Both the interval pattern (linear/exponential) and the number of attempts can be configured. An important aspect of Polly retries is that they are blocking calls; therefore, it is crucial to ensure that the total retry sequence is not too long (e.g., &lt;8 seconds). Another key characteristic is that Polly retries are not persistent (i.e., they take place in memory). A typical Polly retry configuration looks like this:
21+
22+
```json
23+
{
24+
"PollyConfig": {
25+
"NotificatiesSender": {
26+
"Retry": {
27+
"ShouldRetryAfterHeader": true,
28+
"MaxRetryAttempts": 4,
29+
"BackoffType": "Exponential",
30+
"UseJitter": false,
31+
"Delay": "00:00:00.500"
32+
},
33+
"Timeout": {
34+
"Timeout": "00:00:30"
35+
}
36+
}
37+
}
38+
}
39+
```
40+
41+
The Polly retries are performed at the following times:
42+
43+
- 500 msec.
44+
- 1 sec.
45+
- 2 sec.
46+
- 4 sec.
47+
48+
## Hangfire retries and priority queue
49+
50+
The second level of retries is based on the Hangfire Scheduler. It is possible that the client webhook receiver is down for an extended period. In this case, the Polly retry will not work. Hangfire retries offer a solution by scheduling retries to be processed at a later time, for example, after four hours or even several days. Unlike Polly retries, Hangfire retries are persistent (stored as jobs in the Notifications database). Hangfire retries use two queues: the MAIN and RETRY queues. New notifications are placed in the MAIN queue, while scheduled retries are placed in the RETRY queue. This prevents new notifications from waiting until all retries have been processed, as only a limited number of jobs can be executed continuously. More importantly, the retry period can be extended (e.g., up to one day).
51+
52+
After the last failed retry, Hangfire moves the retry job to the 'Failed Jobs' state.
53+
54+
A nice feature is that Hangfire includes a Dashboard that displays all jobs (Retry Jobs, Successfully Executed Jobs, Failed Jobs, and Deleted Jobs). Failed jobs can even be restarted manually.
55+
56+
A typical Hangfire retry configuration looks like this:
57+
58+
```json
59+
{
60+
"Hangfire": {
61+
"RetrySchedule": "0.00:15;0.00:30;0.01:00;0.04:00;1.00:00",
62+
"ExpireFailedJobsScanAt": "05:00",
63+
"ExpireFailedJobAfter": "7.00:00"
64+
}
65+
}
66+
```
67+
68+
The retry pattern (configured in RetrySchedule) is:
69+
70+
- 15 minutes
71+
- 30 minutes
72+
- 1 hour
73+
- 4 hours
74+
- 1 day
75+
76+
There are two other settings intended for automatically cleaning up failed retry jobs:
77+
78+
- ExpireFailedJobsScanAt (scan interval or the value: 'never', 'disabled', 'n/a')
79+
- ExpireFailedJobAfter (period that failed jobs must continue to exist)
80+
81+
## Circuit breaker
82+
83+
Another component of the notification system is the circuit breaker. This prevents the system from repeatedly attempting to contact unresponsive (or faulty) webhook receivers over a period of time, which can severely and unnecessarily block resources (due to timeouts).
84+
85+
The concept is that a webhook receiver is only allowed to fail a limited number of times. When calls to a webhook receiver fail, the failures are monitored and recorded. For example, after 10 failures, the webhook receiver is marked as BLOCKED (and effectively no longer monitored). The circuit breaker maintains this block for a specified period (e.g., 5 minutes). After this period, it attempts to deliver the notification again. If the receiver fails again, the block is reapplied. This mechanism prevents unnecessary calls to unresponsive webhook receivers, improving system performance.
86+
87+
All webhook receiver blocks are automatically released after a specified period unless triggered again.
88+
89+
A typical Circuit breaker configuration looks like this:
90+
91+
```json
92+
{
93+
"CircuitBreaker": {
94+
"FailureThreshold": 10,
95+
"BreakDuration": "00:05:00",
96+
"CacheExpirationMinutes": 10
97+
}
98+
}
99+
```
100+
101+
The possible settings under CircuitBreaker are:
102+
103+
- FailureThreshold (number of times you can fail before the BLOCKADE takes effect)
104+
- BreakDuration (time of the BLOCKADE)
105+
- CacheExpirationMinutes (time that MONITORING and BLOCKS are lifted unless called)
106+
107+
## HTTP status codes triggering retries
108+
109+
### Polly retry
110+
111+
Polly will perform retries according to the configured policy for the following HTTP status codes:
112+
113+
- `HttpRequestException`
114+
- All HTTP 5xx codes
115+
- 408 Request Timeout
116+
- 429 Too Many Requests
117+
118+
When other HTTP status codes are returned, Polly stops immediately (but Hangfire will retry).
119+
120+
It's possible to configure retries on additional HTTP status codes. This can be done using the AddRetryOnHttpStatusCodes setting directly under PollyConfig. If, for example, you want to retry always on unauthorized (401) errors, you can configure this with the following line:
121+
122+
```json
123+
{
124+
"PollyConfig": {
125+
"NotificatiesSender": {
126+
"Retry": {
127+
"ShouldRetryAfterHeader": true,
128+
"MaxRetryAttempts": 4,
129+
"BackoffType": "Exponential",
130+
"UseJitter": true,
131+
"Delay": "00:00:00.500"
132+
},
133+
"Timeout": {
134+
"Timeout": "00:00:30"
135+
},
136+
"AddRetryOnHttpStatusCodes": "404;..."
137+
}
138+
}
139+
}
140+
```
141+
142+
### Hangfire retry
143+
144+
The hangfire retry jobs make no distinction between different response codes and will always perform a retry.

sidebars.ts

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,11 @@ const sidebars: SidebarsConfig = {
1111
id: "gebruik-van-subscriptions-in-autorisaties",
1212
label: "Use of Subscriptions for Notifications"
1313
},
14+
{
15+
type: "doc",
16+
id: "notifications-architecture",
17+
label: "How does the retry system for notitifations work?"
18+
},
1419
"version-header",
1520
"example-document-upload/example-document-upload",
1621
"ztc1_3problemsandsolutions"

0 commit comments

Comments
 (0)