|
| 1 | +--- |
| 2 | +categories: |
| 3 | +- docs |
| 4 | +- develop |
| 5 | +- stack |
| 6 | +- oss |
| 7 | +- rs |
| 8 | +- rc |
| 9 | +- oss |
| 10 | +- kubernetes |
| 11 | +- clients |
| 12 | +description: Improve reliability using the failover/failback features of Jedis. |
| 13 | +linkTitle: Failover/failback |
| 14 | +title: Failover and failback |
| 15 | +weight: 50 |
| 16 | +--- |
| 17 | + |
| 18 | +Jedis supports [failover and failback](https://en.wikipedia.org/wiki/Failover) |
| 19 | +to improve the availability of connections to Redis databases. This page explains |
| 20 | +the concepts and describes how to configure Jedis for failover and failback. |
| 21 | + |
| 22 | +## Concepts |
| 23 | + |
| 24 | +You may have [Active-Active databases]({{< relref "/operate/rs/databases/active-active" >}}) |
| 25 | +or independent Redis servers that are all suitable to serve your app. |
| 26 | +Typically, you would prefer some database endpoints over others for a particular |
| 27 | +instance of your app (perhaps the ones that are closest geographically to the app server |
| 28 | +to reduce network latency). However, if the best endpoint is not available due |
| 29 | +to a failure, it is generally better to switch to another, suboptimal endpoint |
| 30 | +than to let the app fail completely. |
| 31 | + |
| 32 | +*Failover* is the technique of actively checking for connection failures and |
| 33 | +automatically switching to another endpoint when a failure is detected. |
| 34 | + |
| 35 | +{{< image filename="images/failover/failover-client-reconnect.svg" alt="Failover and client reconnection" >}} |
| 36 | + |
| 37 | +The complementary technique of *failback* then involves checking the original |
| 38 | +endpoint periodically to see if it has recovered, and switching back to it |
| 39 | +when it is available again. |
| 40 | + |
| 41 | +{{< image filename="images/failover/failover-client-failback.svg" alt="Failback: client switches back to original server" width="75%" >}} |
| 42 | + |
| 43 | +### Detecting a failed connection |
| 44 | + |
| 45 | +Jedis uses the [resilience4j](https://resilience4j.readme.io/docs/getting-started) |
| 46 | +to detect connection failures using a |
| 47 | +[circuit breaker design pattern](https://en.wikipedia.org/wiki/Circuit_breaker_design_pattern). |
| 48 | + |
| 49 | +The circuit breaker is a software component that tracks recent connection |
| 50 | +attempts in sequence, recording which ones have succeeded and which have failed. |
| 51 | +(Note that many connection failures are transient, so before recording a failure, |
| 52 | +the first response should usually be just to retry the connection a few times.) |
| 53 | + |
| 54 | +The status of the connection attempts is kept in a "sliding window", which |
| 55 | +is simply a buffer where the least recent item is dropped as each new |
| 56 | +one is added. |
| 57 | + |
| 58 | +{{< image filename="images/failover/failover-sliding-window.svg" alt="Sliding window of recent connection attempts" >}} |
| 59 | + |
| 60 | +When the number of failures in the window exceeds a configured |
| 61 | +threshold, the circuit breaker declares the server to be unhealthy and triggers |
| 62 | +a failover. |
| 63 | + |
| 64 | +### Selecting a failover target |
| 65 | + |
| 66 | +Since you may have multiple Redis servers available to fail over to, Jedis |
| 67 | +lets you configure a list of endpoints to try, ordered by priority or |
| 68 | +"weight". When a failover is triggered, Jedis selects the highest-weighted |
| 69 | +endpoint that is still healthy and uses it for the temporary connection. |
| 70 | + |
| 71 | +### Health checks |
| 72 | + |
| 73 | +Given that the original endpoint had some geographical or other advantage |
| 74 | +over the failover target, you will generally want to fail back to it as soon |
| 75 | +as it recovers. To detect when this happens, Jedis periodically |
| 76 | +runs a "health check" on the server. This can be as simple as |
| 77 | +sending a Redis [`ECHO`]({{< relref "/commands/echo" >}})) command and checking |
| 78 | +that it gives a response. |
| 79 | + |
| 80 | +You can also configure Jedis to run health checks on the current target |
| 81 | +server during periods of inactivity. This can help to detect when the |
| 82 | +server has failed and a failover is needed even when your app is not actively |
| 83 | +using it. |
| 84 | + |
| 85 | + |
0 commit comments