You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* DOC-5567 RS: Added lag-awareness to DB availability REST API references
* DOC-5567 RS: Added lag-aware checks to DB availability doc
* RS: Updated status code links in cluster actions REST API reference
* RS: Added new 406 status code and missing 404 code to initiate cluster-wide action REST API request reference
* DOC-4699 Added version change for change_master cluster action behavior to RS Gilboa release notes
* DOC-5567 Feedback update to remove override option from the adjust availability lag tolerance threshold section and change the section to focus on changing the default only
Copy file name to clipboardExpand all lines: content/operate/rs/monitoring/db-availability.md
+44Lines changed: 44 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -45,6 +45,50 @@ Returns HTTP status code 200 OK if all primary (master) shards are reachable fro
45
45
46
46
If the local database endpoint is unavailable, returns an error status code and a JSON object that contains [`error_code` and `description` fields]({{<relref "/operate/rs/references/rest-api/requests/bdbs/availability#get-endpoint-error-codes">}}).
47
47
48
+
## Use lag-aware availability checks for disaster recovery {#lag-aware}
49
+
50
+
The database availability API supports lag-aware availability checks that consider replication lag tolerance. You can reduce the risk of data inconsistencies during disaster recovery by incorporating lag-aware availability checks into your disaster recovery solution and ensuring failover-failback flows only occur when databases are accessible and sufficiently synchronized.
51
+
52
+
### Change default availability lag tolerance threshold
53
+
54
+
The lag tolerance threshold is 100 milliseconds by default. Depending on factors such as workload, network conditions, and throughput, you might want to adjust the lag tolerance threshold.
55
+
56
+
To change the default threshold for the entire cluster, set `availability_lag_tolerance_ms` with an [update cluster]({{<relref "/operate/rs/references/rest-api/requests/cluster#put-cluster">}}) request:
57
+
58
+
```sh
59
+
PUT /v1/cluster
60
+
{ "availability_lag_tolerance_ms": 100 }
61
+
```
62
+
63
+
### Lag-aware database availability checks
64
+
65
+
To perform a lag-aware database availability check using the cluster's default lag tolerance threshold:
66
+
67
+
```sh
68
+
GET /v1/bdbs/<database_id>/availability?extend_check=lag
69
+
```
70
+
71
+
To perform a lag-aware database availability check and override the cluster's default lag tolerance threshold:
72
+
73
+
```sh
74
+
GET /v1/bdbs/<database_id>/availability?extend_check=lag&availability_lag_tolerance_ms=100
75
+
```
76
+
77
+
### Lag-aware endpoint availability checks
78
+
79
+
To perform a lag-aware database endpoint availability check using the cluster's default lag tolerance threshold:
80
+
81
+
```sh
82
+
GET /v1/local/bdbs/<database_id>/endpoint/availability?extend_check=lag
83
+
```
84
+
85
+
To perform a lag-aware database endpoint availability check and override the cluster's default lag tolerance threshold:
86
+
87
+
```sh
88
+
GET /v1/local/bdbs/<database_id>/endpoint/availability?extend_check=lag&availability_lag_tolerance_ms=100
89
+
```
90
+
91
+
48
92
## Availability by database status
49
93
50
94
The following table shows the relationship between a database's status and availability. For more details about the database status values, see [BDB status field]({{<relref "/operate/rs/references/rest-api/objects/bdb/status">}}).
| <spanclass="break-all">availability_lag_tolerance_ms</span> | integer (default: 100) | The maximum replication lag in milliseconds tolerated between source and replicas during [lag-aware database availability checks]({{<relref "/operate/rs/monitoring/db-availability#lag-aware">}}). |
Copy file name to clipboardExpand all lines: content/operate/rs/references/rest-api/requests/bdbs/availability.md
+49-2Lines changed: 49 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -32,12 +32,26 @@ Verifies the local database endpoint is available. This request does not redirec
32
32
33
33
### Request {#get-endpoint-request}
34
34
35
-
#### Example HTTP request
35
+
#### Example HTTP requests
36
+
37
+
To check database endpoint availability without any additional checks:
36
38
37
39
```sh
38
40
GET /v1/local/bdbs/1/endpoint/availability
39
41
```
40
42
43
+
To perform a lag-aware database endpoint availability check using the cluster's default lag tolerance threshold:
44
+
45
+
```sh
46
+
GET /v1/local/bdbs/1/endpoint/availability?extend_check=lag
47
+
```
48
+
49
+
To perform a lag-aware database endpoint availability check and override the cluster's default lag tolerance threshold:
50
+
51
+
```sh
52
+
GET /v1/local/bdbs/1/endpoint/availability?extend_check=lag&availability_lag_tolerance_ms=100
53
+
```
54
+
41
55
#### Headers
42
56
43
57
| Key | Value | Description |
@@ -51,6 +65,13 @@ GET /v1/local/bdbs/1/endpoint/availability
51
65
|-------|------|-------------|
52
66
| uid | integer | The unique ID of the database. |
53
67
68
+
#### Query parameters
69
+
70
+
| Field | Type | Description |
71
+
|-------|------|-------------|
72
+
| extend_check | list of comma-separated strings | List of additional availability checks to perform (optional)<br />Values:<br />**lag**: Enables lag-aware checks to assess replication health. Determines if a replica is sufficiently synced with the primary for failover/failback scenarios. |
73
+
| availability_lag_tolerance_ms | integer | Overrides the cluster's default lag tolerance threshold when using `extend_check=lag`. Recommended value: 100 milliseconds. |
74
+
54
75
### Response {#get-endpoint-response}
55
76
56
77
Returns the status code `200 OK` if the local database endpoint is available.
@@ -74,6 +95,8 @@ The following are possible `error_code` values:
74
95
| Code | Description |
75
96
|------|-------------|
76
97
|[200 OK](https://www.rfc-editor.org/rfc/rfc9110.html#name-200-ok)| Database endpoint is available. |
98
+
|[400 Bad Request](https://www.rfc-editor.org/rfc/rfc9110.html#name-400-bad-request)| Invalid schema. |
99
+
|[404 Not Found](https://www.rfc-editor.org/rfc/rfc9110.html#name-404-not-found)| Database not found. |
77
100
|[503 Service Unavailable](https://www.rfc-editor.org/rfc/rfc9110.html#name-503-service-unavailable)| Database endpoint is unavailable. |
78
101
79
102
@@ -97,12 +120,27 @@ Gets the availability status of a database.
97
120
98
121
### Request {#get-db-request}
99
122
100
-
#### Example HTTP request
123
+
#### Example HTTP requests
124
+
125
+
126
+
To check database availability without any additional checks:
101
127
102
128
```sh
103
129
GET /v1/bdbs/1/availability
104
130
```
105
131
132
+
To perform a lag-aware database availability check using the cluster's default lag tolerance threshold:
133
+
134
+
```sh
135
+
GET /v1/bdbs/1/availability?extend_check=lag
136
+
```
137
+
138
+
To perform a lag-aware database availability check and override the cluster's default lag tolerance threshold:
139
+
140
+
```sh
141
+
GET /v1/bdbs/1/availability?extend_check=lag&availability_lag_tolerance_ms=100
142
+
```
143
+
106
144
#### Headers
107
145
108
146
| Key | Value | Description |
@@ -116,6 +154,13 @@ GET /v1/bdbs/1/availability
116
154
|-------|------|-------------|
117
155
| uid | integer | The unique ID of the database. |
118
156
157
+
#### Query parameters
158
+
159
+
| Field | Type | Description |
160
+
|-------|------|-------------|
161
+
| extend_check | list of comma-separated strings | List of additional availability checks to perform (optional)<br />Values:<br />**lag**: Enables lag-aware checks to assess replication health. Determines if a replica is sufficiently synced with the primary for failover/failback scenarios. |
162
+
| availability_lag_tolerance_ms | integer | Overrides the cluster's default lag tolerance threshold when using `extend_check=lag`. Recommended value: 100 milliseconds. |
163
+
119
164
### Response {#get-db-response}
120
165
121
166
Returns the status code `200 OK` if the database is available.
@@ -139,4 +184,6 @@ The following are possible `error_code` values:
139
184
| Code | Description |
140
185
|------|-------------|
141
186
|[200 OK](https://www.rfc-editor.org/rfc/rfc9110.html#name-200-ok)| Database is available. |
187
+
|[400 Bad Request](https://www.rfc-editor.org/rfc/rfc9110.html#name-400-bad-request)| Invalid schema. |
188
+
|[404 Not Found](https://www.rfc-editor.org/rfc/rfc9110.html#name-404-not-found)| Database not found. |
142
189
|[503 Service Unavailable](https://www.rfc-editor.org/rfc/rfc9110.html#name-503-service-unavailable)| Database is unavailable or doesn't have quorum. |
0 commit comments