Skip to content

Commit 26bf3aa

Browse files
DOC-5665 a few fixes
1 parent f2ced87 commit 26bf3aa

File tree

1 file changed

+45
-37
lines changed

1 file changed

+45
-37
lines changed

content/develop/clients/jedis/failover.md

Lines changed: 45 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -23,27 +23,26 @@ the concepts and describes how to configure Jedis for failover and failback.
2323

2424
You may have several [Active-Active databases]({{< relref "/operate/rs/databases/active-active" >}})
2525
or independent Redis servers that are all suitable to serve your app.
26-
Typically, you would prefer some database endpoints over others for a particular
26+
Typically, you would prefer to use some database endpoints over others for a particular
2727
instance of your app (perhaps the ones that are closest geographically to the app server
2828
to reduce network latency). However, if the best endpoint is not available due
2929
to a failure, it is generally better to switch to another, suboptimal endpoint
3030
than to let the app fail completely.
3131

3232
*Failover* is the technique of actively checking for connection failures or
33-
unacceptably slow connections and
34-
automatically switching to another endpoint when they occur. The
35-
diagram below shows this process:
33+
unacceptably slow connections and automatically switching to the best available endpoint
34+
when they occur. This requires you to specify a list of endpoints to try, ordered by priority. The diagram below shows this process:
3635

3736
{{< image filename="images/failover/failover-client-reconnect.svg" alt="Failover and client reconnection" >}}
3837

3938
The complementary technique of *failback* then involves periodically checking the health
40-
of endpoints that are preferred to the current, temporary endpoint.
41-
If any preferred endpoint has recovered, the connection is switched over to it.
42-
This could potentially continue until the best endpoint is available again.
39+
of all endpoints that have failed. If any endpoints recover, the failback mechanism
40+
automatically switches the connection to the one with the highest priority.
41+
This could potentially be repeated until the optimal endpoint is available again.
4342

4443
{{< image filename="images/failover/failover-client-failback.svg" alt="Failback: client switches back to original server" width="75%" >}}
4544

46-
### Detecting a failed connection
45+
### Detecting connection problems
4746

4847
Jedis uses the [resilience4j](https://resilience4j.readme.io/docs/getting-started)
4948
library to detect connection problems using a
@@ -58,7 +57,8 @@ the command a few times.)
5857

5958
The status of the attempted command calls is kept in a "sliding window", which
6059
is simply a buffer where the least recent item is dropped as each new
61-
one is added.
60+
one is added. The buffer can be configured to have a fixed number of items or to
61+
be based on a time window.
6262

6363
{{< image filename="images/failover/failover-sliding-window.svg" alt="Sliding window of recent connection attempts" >}}
6464

@@ -77,15 +77,18 @@ endpoint that is still healthy and uses it for the temporary connection.
7777

7878
Given that the original endpoint had some geographical or other advantage
7979
over the failover target, you will generally want to fail back to it as soon
80-
as it recovers. To detect when this happens, Jedis periodically
81-
runs a "health check" on the server. This can be as simple as
82-
sending a Redis [`ECHO`]({{< relref "/commands/echo" >}}) command and checking
83-
that it gives a response.
80+
as it recovers. In the meantime, another server might recover that is
81+
still better than the current failover target, so it might be worth
82+
failing back to that server even if it is not optimal.
83+
84+
Jedis periodically runs a "health check" on each server to see if it has recovered.
85+
The health check can be as simple as
86+
sending a Redis [`ECHO`]({{< relref "/commands/echo" >}}) command and ensuring
87+
that it gives the expected response.
8488

8589
You can also configure Jedis to run health checks on the current target
86-
server during periods of inactivity. This can help to detect when the
87-
server has failed and a failover is needed even when your app is not actively
88-
using it.
90+
server during periods of inactivity, even if no failover has occurred. This can
91+
help to detect problems even if your app is not actively using the server.
8992

9093
## Failover configuration
9194

@@ -121,10 +124,16 @@ weight being tried first.
121124
MultiClusterClientConfig.ClusterConfig[] clusterConfigs = new MultiClusterClientConfig.ClusterConfig[2];
122125

123126
HostAndPort east = new HostAndPort("redis-east.example.com", 14000);
124-
clusterConfigs[0] = ClusterConfig.builder(east, config).connectionPoolConfig(poolConfig).weight(1.0f).build();
127+
clusterConfigs[0] = ClusterConfig.builder(east, config)
128+
.connectionPoolConfig(poolConfig)
129+
.weight(1.0f)
130+
.build();
125131

126132
HostAndPort west = new HostAndPort("redis-west.example.com", 14000);
127-
clusterConfigs[1] = ClusterConfig.builder(west, config).connectionPoolConfig(poolConfig).weight(0.5f).build();
133+
clusterConfigs[1] = ClusterConfig.builder(west, config)
134+
.connectionPoolConfig(poolConfig)
135+
.weight(0.5f)
136+
.build();
128137
```
129138

130139
Pass the `clusterConfigs` array when you create the `MultiClusterClientConfig` builder.
@@ -175,11 +184,11 @@ the circuit breaker:
175184
| Builder method | Default value | Description|
176185
| --- | --- | --- |
177186
| `circuitBreakerSlidingWindowType()` | `COUNT_BASED` | Type of sliding window. `COUNT_BASED` uses a sliding window based on the number of calls, while `TIME_BASED` uses a sliding window based on time. |
178-
| `circuitBreakerSlidingWindowSize()` | `100` | Size of the sliding window in number of calls or time in seconds, depending on the sliding window type. |
187+
| `circuitBreakerSlidingWindowSize()` | `100` | Size of the sliding window (this is the number of calls for a `COUNT_BASED` window or time in seconds, for a `TIME_BASED` window). |
179188
| `circuitBreakerSlidingWindowMinCalls()` | `10` | Minimum number of calls required (per sliding window period) before the circuit breaker will start calculating the error rate or slow call rate. |
180189
| `circuitBreakerFailureRateThreshold()` | `50.0f` | Percentage of failures to trigger the circuit breaker. |
181190
| `circuitBreakerSlowCallRateThreshold()` | `100.0f` | Percentage of slow calls to trigger the circuit breaker. |
182-
| `circuitBreakerSlowCallDurationThreshold()` | `60000` | Duration in milliseconds to consider a call as slow. |
191+
| `circuitBreakerSlowCallDurationThreshold()` | `60000` | Duration in milliseconds after which a call is considered slow. |
183192
| `circuitBreakerIncludedExceptionList()` | See description | `List` of `Throwable` classes that should be considered as failures. By default, it includes just `JedisConnectionException`. |
184193
| `circuitBreakerIgnoreExceptionList()` | `null` | `List` of `Throwable` classes that should be ignored for failure rate calculation. |
185194

@@ -190,19 +199,19 @@ The `MultiClusterClientConfig` builder has the following options to configure re
190199
| Builder method | Default value | Description|
191200
| --- | --- | --- |
192201
| `retryMaxAttempts()` | `3` | Maximum number of retry attempts (including the initial call). |
193-
| `retryWaitDuration()` | `500` | Number of milliseconds to wait between retry attempts. |
194-
| `retryWaitDurationExponentialBackoffMultiplier()` | `2` | [Exponential backoff](https://en.wikipedia.org/wiki/Exponential_backoff) factor multiplied against wait duration between retries. For example, with a wait duration of 1 second and a multiplier of 2, the retries would occur after 1s, 2s, 4s, 8s, 16s, and so on. |
202+
| `retryWaitDuration()` | `500` | Initial number of milliseconds to wait between retry attempts. |
203+
| `retryWaitDurationExponentialBackoffMultiplier()` | `2` | [Exponential backoff](https://en.wikipedia.org/wiki/Exponential_backoff) factor multiplied by the wait duration between retries. For example, with a wait duration of 1 second and a multiplier of 2, the retries would occur after 1s, 2s, 4s, 8s, 16s, and so on. |
195204
| `retryIncludedExceptionList()` | See description | `List` of `Throwable` classes that should be considered as failures to be retried. By default, it includes just `JedisConnectionException`. |
196205
| `retryIgnoreExceptionList()` | `null` | `List` of `Throwable` classes that should be ignored for retry. |
197206

198207
### Failover callbacks
199208

200209
You may want to take some custom action when a failover occurs.
201-
For example, you may want to log a warning, increment a metric,
210+
For example, you could log a warning, increment a metric,
202211
or externally persist the cluster connection state.
203212

204213
You can provide a custom failover action using a class that
205-
implements `java.util.function.Consumer`. You should place
214+
implements `java.util.function.Consumer`. Place
206215
the custom action in the `accept()` method, as shown in the example below.
207216

208217
```java
@@ -234,10 +243,8 @@ The `accept()` method is now called whenever a failover occurs.
234243

235244
## Health check configuration
236245

237-
The general strategy for health checks is to ask the Redis server for a
238-
response that it could only give if it is healthy. There are several
239-
specific strategies available for health checks that you can configure using the
240-
`MultiClusterClientConfig` builder. The sections below explain these
246+
There are several strategies available for health checks that you can configure using the
247+
`MultiClusterClientConfig` builder. The sections below explain these strategies
241248
in more detail.
242249

243250
### `EchoStrategy` (default)
@@ -265,9 +272,9 @@ builder.
265272
BiFunction<HostAndPort, Supplier<RedisCredentials>, MultiClusterClientConfig.StrategySupplier> healthCheckStrategySupplier =
266273
(HostAndPort clusterHostPort, Supplier<RedisCredentials> credentialsSupplier) -> {
267274
LagAwareStrategy.Config lagConfig = LagAwareStrategy.Config.builder(clusterHostPort, credentialsSupplier)
268-
.interval(5000) // Check every 5 seconds
269-
.timeout(3000) // 3 second timeout
270-
.extendedCheckEnabled(true) // Check replication lag
275+
.interval(5000) // Check every 5 seconds
276+
.timeout(3000) // 3 second timeout
277+
.extendedCheckEnabled(true) // Check replication lag
271278
.build();
272279

273280
return (hostAndPort, jedisClientConfig) -> new LagAwareStrategy(lagConfig);
@@ -290,11 +297,10 @@ MultiClusterClientConfig.ClusterConfig clusterConfig =
290297
### Custom health check strategy
291298

292299
You can supply your own custom health check strategy by
293-
implementing the `HealthCheckStrategy` interface. You might
294-
use this to implement custom checks or to integrate with
295-
external monitoring tools, for example. The example below
296-
shows a simple custom strategy. As with `LagAwareStrategy`, you
297-
can pass a custom strategy implementation to the `MultiClusterClientConfig.ClusterConfig`
300+
implementing the `HealthCheckStrategy` interface. For example, you might
301+
use this to integrate with external monitoring tools or to implement
302+
checks that are specific to your application. The example below
303+
shows a simple custom strategy. Pass your custom strategy implementation to the `MultiClusterClientConfig.ClusterConfig`
298304
builder with the `healthCheckStrategySupplier()` method.
299305

300306
```java
@@ -362,9 +368,11 @@ manually:
362368
```java
363369
// The `setActiveCluster()` method receives the `HostAndPort` of the
364370
// cluster to switch to.
365-
provider.setActiveCluster("west");
371+
provider.setActiveCluster(west);
366372
```
367373

374+
Note that `setActiveCluster()` is thread-safe.
375+
368376
If you decide to implement manual failback, you will need a way for external
369377
systems to trigger this method in your application. For example, if your application
370378
exposes a REST API, you might consider creating a REST endpoint to call

0 commit comments

Comments
 (0)