Skip to content

Commit effa51a

Browse files
DOC-5665 failover concepts
1 parent 70443ea commit effa51a

File tree

4 files changed

+405
-0
lines changed

4 files changed

+405
-0
lines changed
Lines changed: 85 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,85 @@
1+
---
2+
categories:
3+
- docs
4+
- develop
5+
- stack
6+
- oss
7+
- rs
8+
- rc
9+
- oss
10+
- kubernetes
11+
- clients
12+
description: Improve reliability using the failover/failback features of Jedis.
13+
linkTitle: Failover/failback
14+
title: Failover and failback
15+
weight: 50
16+
---
17+
18+
Jedis supports [failover and failback](https://en.wikipedia.org/wiki/Failover)
19+
to improve the availability of connections to Redis databases. This page explains
20+
the concepts and describes how to configure Jedis for failover and failback.
21+
22+
## Concepts
23+
24+
You may have [Active-Active databases]({{< relref "/operate/rs/databases/active-active" >}})
25+
or independent Redis servers that are all suitable to serve your app.
26+
Typically, you would prefer some database endpoints over others for a particular
27+
instance of your app (perhaps the ones that are closest geographically to the app server
28+
to reduce network latency). However, if the best endpoint is not available due
29+
to a failure, it is generally better to switch to another, suboptimal endpoint
30+
than to let the app fail completely.
31+
32+
*Failover* is the technique of actively checking for connection failures and
33+
automatically switching to another endpoint when a failure is detected.
34+
35+
{{< image filename="images/failover/failover-client-reconnect.svg" alt="Failover and client reconnection" >}}
36+
37+
The complementary technique of *failback* then involves checking the original
38+
endpoint periodically to see if it has recovered, and switching back to it
39+
when it is available again.
40+
41+
{{< image filename="images/failover/failover-client-failback.svg" alt="Failback: client switches back to original server" width="75%" >}}
42+
43+
### Detecting a failed connection
44+
45+
Jedis uses the [resilience4j](https://resilience4j.readme.io/docs/getting-started)
46+
to detect connection failures using a
47+
[circuit breaker design pattern](https://en.wikipedia.org/wiki/Circuit_breaker_design_pattern).
48+
49+
The circuit breaker is a software component that tracks recent connection
50+
attempts in sequence, recording which ones have succeeded and which have failed.
51+
(Note that many connection failures are transient, so before recording a failure,
52+
the first response should usually be just to retry the connection a few times.)
53+
54+
The status of the connection attempts is kept in a "sliding window", which
55+
is simply a buffer where the least recent item is dropped as each new
56+
one is added.
57+
58+
{{< image filename="images/failover/failover-sliding-window.svg" alt="Sliding window of recent connection attempts" >}}
59+
60+
When the number of failures in the window exceeds a configured
61+
threshold, the circuit breaker declares the server to be unhealthy and triggers
62+
a failover.
63+
64+
### Selecting a failover target
65+
66+
Since you may have multiple Redis servers available to fail over to, Jedis
67+
lets you configure a list of endpoints to try, ordered by priority or
68+
"weight". When a failover is triggered, Jedis selects the highest-weighted
69+
endpoint that is still healthy and uses it for the temporary connection.
70+
71+
### Health checks
72+
73+
Given that the original endpoint had some geographical or other advantage
74+
over the failover target, you will generally want to fail back to it as soon
75+
as it recovers. To detect when this happens, Jedis periodically
76+
runs a "health check" on the server. This can be as simple as
77+
sending a Redis [`ECHO`]({{< relref "/commands/echo" >}})) command and checking
78+
that it gives a response.
79+
80+
You can also configure Jedis to run health checks on the current target
81+
server during periods of inactivity. This can help to detect when the
82+
server has failed and a failover is needed even when your app is not actively
83+
using it.
84+
85+
Lines changed: 89 additions & 0 deletions
Loading
Lines changed: 107 additions & 0 deletions
Loading

0 commit comments

Comments
 (0)