|
| 1 | +# Failover with Jedis |
| 2 | + |
| 3 | +Jedis supports failover for your Redis deployments. This is useful when: |
| 4 | +1. You have more than one Redis deployment. This might include two independent Redis servers or two or more Redis databases replicated across multiple [active-active Redis Enterprise](https://docs.redis.com/latest/rs/databases/active-active/) clusters. |
| 5 | +2. You want your application to connect to and use one deployment at a time. |
| 6 | +3. You want your application to fail over to the next available deployment if the current deployment becomes unavailable. |
| 7 | + |
| 8 | +Jedis will fail over to a subsequent Redis deployment after reaching a configurable failure threshold. |
| 9 | +This failure threshold is implemented using a [circuit breaker pattern](https://en.wikipedia.org/wiki/Circuit_breaker_design_pattern). |
| 10 | + |
| 11 | +You can also configure Jedis to retry failed calls to Redis. |
| 12 | +Once a maximum number of retries have been exhausted, the circuit breaker will record a failure. |
| 13 | +When the circuit breaker reaches its failure threshold, a failover will be triggered on the subsequent operation. |
| 14 | + |
| 15 | +The remainder of this guide describes: |
| 16 | + |
| 17 | +* A basic failover configuration |
| 18 | +* Supported retry and circuit breaker settings |
| 19 | +* Failback and the cluster selection API |
| 20 | + |
| 21 | +We recommend that you read this guide carefully and understand the configuration settings before enabling Jedis failover |
| 22 | +in production. |
| 23 | + |
| 24 | +## Basic usage |
| 25 | + |
| 26 | +To configure Jedis for failover, you specify an ordered list of Redis databases. |
| 27 | +By default, Jedis will connect to the first Redis database in the list. If the first database becomes unavailable, |
| 28 | +Jedis will attempt to connect to the next database in the list, and so on. |
| 29 | + |
| 30 | +Suppose you run two Redis deployments. |
| 31 | +We'll call them `redis-east` and `redis-west`. |
| 32 | +You want your application to first connect to `redis-east`. |
| 33 | +If `redis-east` becomes unavailable, you want your application to connect to `redis-west`. |
| 34 | + |
| 35 | +Let's look at one way of configuring Jedis for this scenario. |
| 36 | + |
| 37 | +First, create an array of `ClusterConfig` objects, one for each Redis database. |
| 38 | + |
| 39 | +```java |
| 40 | +JedisClientConfig config = DefaultJedisClientConfig.builder().user("cache").password("secret").build(); |
| 41 | + |
| 42 | +ClusterConfig[] clientConfigs = new ClusterConfig[2]; |
| 43 | +clientConfigs[0] = new ClusterConfig(new HostAndPort("redis-east.example.com", 14000), config); |
| 44 | +clientConfigs[1] = new ClusterConfig(new HostAndPort("redis-west.example.com", 14000), config); |
| 45 | +``` |
| 46 | + |
| 47 | +The configuration above represents your two Redis deployments: `redis-east` and `redis-west`. |
| 48 | +You'll use this array of configuration objects to create a connection provider that supports failover. |
| 49 | + |
| 50 | +Use the `MultiClusterClientConfig` builder to set your preferred retry and failover configuration, passing in the client configs you just created. |
| 51 | +Then build a `MultiClusterPooledConnectionProvider`. |
| 52 | + |
| 53 | +```java |
| 54 | +MultiClusterClientConfig.Builder builder = new MultiClusterClientConfig.Builder(clientConfigs); |
| 55 | +builder.circuitBreakerSlidingWindowSize(10); |
| 56 | +builder.circuitBreakerSlidingWindowMinCalls(1); |
| 57 | +builder.circuitBreakerFailureRateThreshold(50.0f); |
| 58 | + |
| 59 | +MultiClusterPooledConnectionProvider provider = new MultiClusterPooledConnectionProvider(builder.build()); |
| 60 | +``` |
| 61 | + |
| 62 | +Internally, the connection provider uses a [highly configurable circuit breaker and retry implementation](https://resilience4j.readme.io/docs/circuitbreaker) to determine when to fail over. |
| 63 | +In the configuration here, we've set a sliding window size of 10 and a failure rate threshold of 50%. |
| 64 | +This means that a failover will be triggered if 5 out of any 10 calls to Redis fail. |
| 65 | + |
| 66 | +Once you've configured and created a `MultiClusterPooledConnectionProvider`, instantiate a `UnifiedJedis` instance for your application, passing in the provider you just created: |
| 67 | + |
| 68 | +```java |
| 69 | +UnifiedJedis jedis = new UnifiedJedis(provider); |
| 70 | +``` |
| 71 | + |
| 72 | +You can now use this `UnifiedJedis` instance, and the connection management and failover will be handled transparently. |
| 73 | + |
| 74 | +## Configuration options |
| 75 | + |
| 76 | +Under the hood, Jedis' failover support relies on [resilience4j](https://resilience4j.readme.io/docs/getting-started), |
| 77 | +a fault-tolerance library that implements [retry](https://resilience4j.readme.io/docs/retry) and [circuit breakers](https://resilience4j.readme.io/docs/circuitbreaker). |
| 78 | + |
| 79 | +Once you configure Jedis for failover using the `MultiClusterPooledConnectionProvider`, each call to Redis is decorated with a resilience4j retry and circuit breaker. |
| 80 | + |
| 81 | +By default, any call that throws a `JedisConnectionException` will be retried up to 3 times. |
| 82 | +If the call continues to fail after the maximum number of retry attempts, then the circuit breaker will record a failure. |
| 83 | + |
| 84 | +The circuit breaker maintains a record of failures in a sliding window data structure. |
| 85 | +If the failure rate reaches a configured threshold (e.g., when 50% of the last 10 calls have failed), |
| 86 | +then the circuit breaker's state transitions from `CLOSED` to `OPEN`. |
| 87 | +When this occurs, Jedis will attempt to connect to the next Redis database in its client configuration list. |
| 88 | + |
| 89 | +The supported retry and circuit breaker settings, and their default values, are described below. |
| 90 | +You can configure any of these settings using the `MultiClusterClientConfig.Builder` builder. |
| 91 | +Refer the basic usage above for an example of this. |
| 92 | + |
| 93 | +### Retry configuration |
| 94 | + |
| 95 | +Jedis uses the following retry settings: |
| 96 | + |
| 97 | +| Setting | Default value | Description | |
| 98 | +|----------------------------------|----------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| |
| 99 | +| Max retry attempts | 3 | Maximum number of retry attempts (including the initial call) | |
| 100 | +| Retry wait duration | 500 ms | Number of milliseconds to wait between retry attempts | |
| 101 | +| Wait duration backoff multiplier | 2 | Exponential backoff factor multiplied against wait duration between retries. For example, with a wait duration of 1 second and a multiplier of 2, the retries would occur after 1s, 2s, 4s, 8s, 16s, and so on. | |
| 102 | +| Retry included exception list | `JedisConnectionException` | A list of `Throwable` classes that count as failures and should be retried. | |
| 103 | +| Retry ignored exception list | Empty list | A list of `Throwable` classes to explicitly ignore for the purposes of retry. | |
| 104 | + |
| 105 | +To disable retry, set `maxRetryAttempts` to 1. |
| 106 | + |
| 107 | +### Circuit breaker configuration |
| 108 | + |
| 109 | +Jedis uses the following circuit breaker settings: |
| 110 | + |
| 111 | +| Setting | Default value | Description | |
| 112 | +|-----------------------------------------|----------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| |
| 113 | +| Sliding window type | `COUNT_BASED` | The type of sliding window used to record the outcome of calls. Options are `COUNT_BASED` and `TIME_BASED`. | |
| 114 | +| Sliding window size | 100 | The size of the sliding window. Units depend on sliding window type. When `COUNT_BASED`, the size represents number of calls. When `TIME_BASED`, the size represents seconds. | |
| 115 | +| Sliding window min calls | 100 | Minimum number of calls required (per sliding window period) before the CircuitBreaker will start calculating the error rate or slow call rate. | |
| 116 | +| Failure rate threshold | `50.0f` | Percentage of calls within the sliding window that must fail before the circuit breaker transitions to the `OPEN` state. | |
| 117 | +| Slow call duration threshold | 60000 ms | Duration threshold above which calls are classified as slow and added to the sliding window. | |
| 118 | +| Slow call rate threshold | `100.0f` | Percentage of calls within the sliding window that exceed the slow call duration threshold before circuit breaker transitions to the `OPEN` state. | |
| 119 | +| Circuit breaker included exception list | `JedisConnectionException` | A list of `Throwable` classes that count as failures and add to the failure rate. | |
| 120 | +| Circuit breaker ignored exception list | Empty list | A list of `Throwable` classes to explicitly ignore for failure rate calculations. | | |
| 121 | + |
| 122 | +### Failover callbacks |
| 123 | + |
| 124 | +In the event that Jedis fails over, you may wish to take some action. This might include logging a warning, recording |
| 125 | +a metric, or externally persisting the cluster connection state, to name just a few examples. For this reason, |
| 126 | +`MultiPooledConnectionProvider` lets you register a custom callback that will be called whenever Jedis |
| 127 | +fails over to a new cluster. |
| 128 | + |
| 129 | +To use this feature, you'll need to design a class that implements `java.util.function.Consumer`. |
| 130 | +This class must implement the `accept` method, as you can see below. |
| 131 | + |
| 132 | +```java |
| 133 | +import org.slf4j.Logger; |
| 134 | +import org.slf4j.LoggerFactory; |
| 135 | + |
| 136 | +import java.util.function.Consumer; |
| 137 | + |
| 138 | +public class FailoverReporter implements Consumer<String> { |
| 139 | + |
| 140 | + @Override |
| 141 | + public void accept(String clusterName) { |
| 142 | + Logger logger = LoggerFactory.getLogger(FailoverReporter.class); |
| 143 | + logger.warn("Jedis failover to cluster: " + clusterName); |
| 144 | + } |
| 145 | +} |
| 146 | +``` |
| 147 | + |
| 148 | +You can then pass an instance of this class to your `MultiPooledConnectionProvider`. |
| 149 | + |
| 150 | +``` |
| 151 | +FailoverReporter reporter = new FailoverReporter(); |
| 152 | +provider.setClusterFailoverPostProcessor(reporter); |
| 153 | +``` |
| 154 | + |
| 155 | +The provider will call your `accept` whenver a faoliver occurs. |
| 156 | + |
| 157 | +## Failing back |
| 158 | + |
| 159 | +We believe that failback should not be automatic. |
| 160 | +If Jedis fails over to a new cluster, Jedis will _not_ automatically fail back to the cluster that it was previously connected to. |
| 161 | +This design prevents a scenario in which Jedis fails back to a cluster that may not be entirely healthy yet. |
| 162 | + |
| 163 | +That said, we do provide an API that you can use to implement automated failback when this is appropriate for your application. |
| 164 | + |
| 165 | +## Failback scenario |
| 166 | + |
| 167 | +When a failover is triggered, Jedis will attempt to connect to the next Redis server in the list of server configurations |
| 168 | +you provide at setup. |
| 169 | + |
| 170 | +For example, recall the `redis-east` and `redis-west` deployments from the basic usage example above. |
| 171 | +Jedis will attempt to connect to `redis-east` first. |
| 172 | +If `redis-east` becomes unavailable (and the circuit breaker transitions), then Jedis will attempt to use `redis-west`. |
| 173 | + |
| 174 | +Now suppose that `redis-east` eventually comes back online. |
| 175 | +You will likely want to fail your application back to `redis-east`. |
| 176 | +However, Jedis will not fail back to `redis-east` automatically. |
| 177 | + |
| 178 | +In this case, we recommend that you first ensure that your `redis-east` deployment is healthy before you fail back your application. |
| 179 | + |
| 180 | +## Failback behavior and cluster selection API |
| 181 | + |
| 182 | +Once you've determined that it's safe to fail back to a previously-unavailable cluster, |
| 183 | +you need to decide how to trigger the failback. There are two ways to accomplish this: |
| 184 | + |
| 185 | +1. Use the cluster selection API |
| 186 | +2. Restart your application |
| 187 | + |
| 188 | +### Fail back using the cluster selection API |
| 189 | + |
| 190 | +`MultiClusterPooledConnectionProvider` exposes a method that you can use to manually select which cluster Jedis should use. |
| 191 | +To select a different cluster to use, pass the cluster's numeric index to `setActiveMultiClusterIndex()`. |
| 192 | + |
| 193 | +The cluster's index is a 1-based index derived from its position in the client configuration. |
| 194 | +For example, suppose you configure Jedis with the following client configs: |
| 195 | + |
| 196 | +``` |
| 197 | +ClusterConfig[] clientConfigs = new ClusterConfig[2]; |
| 198 | +clientConfigs[0] = new ClusterConfig(new HostAndPort("redis-east.example.com", 14000), config); |
| 199 | +clientConfigs[1] = new ClusterConfig(new HostAndPort("redis-west.example.com", 14000), config); |
| 200 | +``` |
| 201 | + |
| 202 | +In this case, `redis-east` will have an index of `1`, and `redis-west` will have an index of `2`. |
| 203 | +To select and fail back to `redis-east`, you would call the function like so: |
| 204 | + |
| 205 | +``` |
| 206 | +provider.setActiveMultiClusterIndex(1); |
| 207 | +``` |
| 208 | + |
| 209 | +This method is thread-safe. |
| 210 | + |
| 211 | +If you decide to implement manual failback, you will need a way for external systems to trigger this method in your |
| 212 | +application. For example, if your application exposes a REST API, you might consider creating a REST endpoint |
| 213 | +to call `setActiveMultiClusterIndex` and fail back the application. |
| 214 | + |
| 215 | +### Fail back by restarting the application |
| 216 | + |
| 217 | +When your application starts, Jedis will attempt to connect to each cluster in the order that the clusters appear |
| 218 | +in your client configuration. It's important to understand this, especially in the case where Jedis has failed over. |
| 219 | +If Jedis has failed over to a new cluster, then restarting the application may result in an inadvertent failback. |
| 220 | +This can happen only if a failed cluster comes back online and the application subsequently restarts. |
| 221 | + |
| 222 | +If you need to avoid this scenario, consider using a failover callback, as described above, to externally record |
| 223 | +the name of the cluster that your application was most recently connected to. You can then check this state on startup |
| 224 | +to ensure that you application only connects to the most recently used cluster. For assistance with this technique, |
| 225 | +[start a discussion](https://github.com/redis/jedis/discussions/new?category=q-a). |
0 commit comments