Skip to content

Commit f2ced87

Browse files
DOC-5665 remaining content added
1 parent 88975d5 commit f2ced87

File tree

1 file changed

+93
-0
lines changed

1 file changed

+93
-0
lines changed

content/develop/clients/jedis/failover.md

Lines changed: 93 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -195,6 +195,43 @@ The `MultiClusterClientConfig` builder has the following options to configure re
195195
| `retryIncludedExceptionList()` | See description | `List` of `Throwable` classes that should be considered as failures to be retried. By default, it includes just `JedisConnectionException`. |
196196
| `retryIgnoreExceptionList()` | `null` | `List` of `Throwable` classes that should be ignored for retry. |
197197

198+
### Failover callbacks
199+
200+
You may want to take some custom action when a failover occurs.
201+
For example, you may want to log a warning, increment a metric,
202+
or externally persist the cluster connection state.
203+
204+
You can provide a custom failover action using a class that
205+
implements `java.util.function.Consumer`. You should place
206+
the custom action in the `accept()` method, as shown in the example below.
207+
208+
```java
209+
import org.slf4j.Logger;
210+
import org.slf4j.LoggerFactory;
211+
212+
import java.util.function.Consumer;
213+
214+
public class FailoverReporter implements Consumer<ClusterSwitchEventArgs> {
215+
216+
@Override
217+
public void accept(ClusterSwitchEventArgs e) {
218+
Logger logger = LoggerFactory.getLogger(FailoverReporter.class);
219+
logger.warn("Jedis failover to cluster: " + e.getClusterName() + " due to " + e.getReason());
220+
}
221+
}
222+
```
223+
224+
Then, pass an instance of your class to your `MultiPooledConnectionProvider`
225+
to enable the action (see [Failover configuration](#failover-configuration)
226+
above for an example of creating the `MultiPooledConnectionProvider`).
227+
228+
```java
229+
FailoverReporter reporter = new FailoverReporter();
230+
provider.setClusterSwitchListener(reporter);
231+
```
232+
233+
The `accept()` method is now called whenever a failover occurs.
234+
198235
## Health check configuration
199236

200237
The general strategy for health checks is to ask the Redis server for a
@@ -314,3 +351,59 @@ MultiClusterClientConfig.ClusterConfig clusterConfig =
314351
.build();
315352
```
316353

354+
## Manual failback
355+
356+
By default, the failback mechanism runs health checks on all servers in the
357+
weighted list and selects the highest-weighted server that is
358+
healthy. However, you can also use the `setActiveCluster()` method of
359+
`MultiClusterPooledConnectionProvider` to select which cluster to use
360+
manually:
361+
362+
```java
363+
// The `setActiveCluster()` method receives the `HostAndPort` of the
364+
// cluster to switch to.
365+
provider.setActiveCluster("west");
366+
```
367+
368+
If you decide to implement manual failback, you will need a way for external
369+
systems to trigger this method in your application. For example, if your application
370+
exposes a REST API, you might consider creating a REST endpoint to call
371+
`setActiveCluster()`.
372+
373+
## Troubleshooting
374+
375+
This section lists some common problems and their solutions.
376+
377+
### Excessive or constant health check failures
378+
379+
If all health checks fail, you should first rule out authentication
380+
problems with the Redis server and also make sure there are no persistent
381+
network connectivity problems. If you still see frequent or constant
382+
failures, try increasing the timeout for health checks and the
383+
interval between them:
384+
385+
```java
386+
HealthCheckStrategy.Config config = HealthCheckStrategy.Config.builder()
387+
.interval(5000) // Less frequent checks
388+
.timeout(2000) // More generous timeout
389+
.build();
390+
```
391+
392+
### Slow failback after recovery
393+
394+
If failback is too slow after a server recovers, you can try
395+
increasing the frequency of health checks and reducing the grace
396+
period before failback is attempted (the grace period is the
397+
minimum time after a failover before Jedis will check if a
398+
failback is possible).
399+
400+
```java
401+
HealthCheckStrategy.Config config = HealthCheckStrategy.Config.builder()
402+
.interval(1000) // More frequent checks
403+
.build();
404+
405+
// Adjust failback timing
406+
MultiClusterClientConfig multiConfig = new MultiClusterClientConfig.Builder(clusterConfigs)
407+
.gracePeriod(5000) // Shorter grace period
408+
.build();
409+
```

0 commit comments

Comments
 (0)