Skip to content

Commit 303ed10

Browse files
atakavciggivoCopilot
authored
[automatic failover] Automatic failover client improvements (part 3) (#4306)
* [automatic failover] Set and test default values for failover config&components (#4298) * - set & test default values * - format * - fix tests failing due to changing defaults * [automatic failover] Add dual thresholds (min num of failures + failure rate) capabililty to circuit breaker (#4295) * [automatic failover] Remove the check for 'GenericObjectPool.getNumWaiters()' in 'TrackingConnectionPool' (#4270) - remove the check for number of waitiers in TrackingConnectionPool * [automatic failover] Configure max total connections for EchoStrategy (#4268) - set maxtotal connections for echoStrategy * [automatic failover] Replace 'CircuitBreaker' with 'Cluster' for 'CircuitBreakerFailoverBase.clusterFailover' (#4275) * - replace CircuitBreaker with Cluster for CircuitBreakerFailoverBase.clusterFailover - improve thread safety with provider initialization * - formatting * [automatic failover] Minor optimizations on fast failover (#4277) * - minor optimizations on fail fast * - volatile failfast * [automatic failover] Implement health check retries (#4273) * - replace minConsecutiveSuccessCount with numberOfRetries - add retries into healtCheckImpl - apply changes to strategy implementations config classes - fix unit tests * - fix typo * - fix failing tests * - add tests for retry logic * - formatting * - format * - revisit numRetries for healthCheck ,replace with numProbes and implement built in policies - new types probecontext, ProbePolicy, HealthProbeContext - add delayer executor pool to healthcheckımpl - adjustments on worker pool of healthCheckImpl for shared use of workers * - format * - expand comment with example case * - drop pooled executor for delays * - polish * - fix tests * - formatting * - checking failing tests * - fix test * - fix flaky tests * - fix flaky test * - add tests for builtin probing policies * - fix flaky test * [automatic failover] Move failover provider to mcf (#4294) * - move failover provider to mcf * - make iterateActiveCluster package private * [automatic failover] Add SSL configuration support to LagAwareStrategy (#4291) * User-provided ssl config for lag-aware health check * ssl scenario test for lag-aware healthcheck * format * format * address review comments - use getters instead of fields * [automatic failover] Implement max number of failover attempts (#4293) * - implement max failover attempt - add tests * - fix user receive the intended exception * -clean+format * - java doc for exceptions * format * - more tests on excaption types in max failover attempts mechanism * format * fix failing timing in test * disable health checks * rename to switchToHealthyCluster * format * - Add dual-threshold (min failures + failure rate) failover to circuit breaker executor - Map config to resilience4j via CircuitBreakerThresholdsAdapter - clean up/simplfy config: drop slow-call and window type - Add thresholdMinNumOfFailures; update some of the defaults - Update provider to use thresholds adapter - Update docs; align examples with new defaults - Add tests for 0% rate, edge thresholds * polish * Update src/main/java/redis/clients/jedis/mcf/CircuitBreakerThresholdsAdapter.java Co-authored-by: Copilot <[email protected]> * - fix typo * - fix min total calls calculation * format * - merge issues fixed * fix javadoc ref * - move threshold evaluations to failoverbase - simplfy executer and cbfailoverconnprovider - adjust config getters - fix failing tests due to COUNT_BASED -> TIME_BASED - new tests for thresholds calculations and impact on circuit state transitions * - avoid facilitating actual CBConfig type in tests * Update src/test/java/redis/clients/jedis/failover/FailoverIntegrationTest.java Co-authored-by: Copilot <[email protected]> * Trigger workflows * - evaluate only in failure recorded and failover immediately - add more test on threshold calculations - enable command line arg for overwriting surefire.excludedGroups * format * check pom * - fix error prone test * [automatic failover] Set and test default values for failover config&components (#4298) * - set & test default values * - format * - fix tests failing due to changing defaults * - fix flaky test * - remove unnecessary checks for failover attempt * - clean and trim adapter class - add docs and more explanantion * fix javadoc issue * - switch to all_succes to fix flaky timing * - fix issue in CircuitBreakerFailoverConnectionProvider * introduce ReflectionTestUtil --------- Co-authored-by: Ivo Gaydazhiev <[email protected]> Co-authored-by: Copilot <[email protected]> * [automatic failover] feat: Add MultiDbClient with multi-endpoint failover and circuit breaker support (#4300) * feat: introduce ResilientRedisClient with multi-endpoint failover support Add ResilientRedisClient extending UnifiedJedis with automatic failover capabilities across multiple weighted Redis endpoints. Includes circuit breaker pattern, health monitoring, and configurable retry logic for high-availability Redis deployments. * format * mark ResilientRedisClientTest as integration one * fix test - make sure endpoint is healthy before activating it * Rename ResilientClient to align with design - ResilientClient -> MultiDbClient (builder, tests, etc) * Rename setActiveEndpoint to setActiveDatabaseEndpoint * Rename clusterSwitchListener to databaseSwitchListener * Rename multiClusterConfig to multiDbConfig * fix api doc's error * fix compilation error after rebase * format * fix example in javadoc * Update ActiveActiveFailoverTest scenariou test to use builder's # Conflicts: # src/test/java/redis/clients/jedis/scenario/ActiveActiveFailoverTest.java * rename setActiveDatabaseEndpoint -. setActiveDatabase * is healthy throw exception if cluster does not exists * format * [automatic failover]Use Endpoint interface instead HostAndPort in multi db (#4302) [clean up] Use Endpoint interface where possible * - fix variable name type * fix typo in variable name * - fix flaky test --------- Co-authored-by: Ivo Gaydazhiev <[email protected]> Co-authored-by: Copilot <[email protected]>
1 parent f8de2fe commit 303ed10

31 files changed

+1759
-480
lines changed

docs/failover.md

Lines changed: 5 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -69,9 +69,8 @@ Then build a `MultiClusterPooledConnectionProvider`.
6969

7070
```java
7171
MultiClusterClientConfig.Builder builder = new MultiClusterClientConfig.Builder(clientConfigs);
72-
builder.circuitBreakerSlidingWindowSize(10); // Sliding window size in number of calls
73-
builder.circuitBreakerSlidingWindowMinCalls(1);
74-
builder.circuitBreakerFailureRateThreshold(50.0f); // percentage of failures to trigger circuit breaker
72+
builder.circuitBreakerSlidingWindowSize(2); // Sliding window size in number of calls
73+
builder.circuitBreakerFailureRateThreshold(10.0f); // percentage of failures to trigger circuit breaker
7574

7675
builder.failbackSupported(true); // Enable failback
7776
builder.failbackCheckInterval(1000); // Check every second the unhealthy cluster to see if it has recovered
@@ -140,12 +139,9 @@ Jedis uses the following circuit breaker settings:
140139

141140
| Setting | Default value | Description |
142141
|-----------------------------------------|----------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
143-
| Sliding window type | `COUNT_BASED` | The type of sliding window used to record the outcome of calls. Options are `COUNT_BASED` and `TIME_BASED`. |
144-
| Sliding window size | 100 | The size of the sliding window. Units depend on sliding window type. When `COUNT_BASED`, the size represents number of calls. When `TIME_BASED`, the size represents seconds. |
145-
| Sliding window min calls | 100 | Minimum number of calls required (per sliding window period) before the CircuitBreaker will start calculating the error rate or slow call rate. |
146-
| Failure rate threshold | `50.0f` | Percentage of calls within the sliding window that must fail before the circuit breaker transitions to the `OPEN` state. |
147-
| Slow call duration threshold | 60000 ms | Duration threshold above which calls are classified as slow and added to the sliding window. |
148-
| Slow call rate threshold | `100.0f` | Percentage of calls within the sliding window that exceed the slow call duration threshold before circuit breaker transitions to the `OPEN` state. |
142+
| Sliding window size | 2 | The size of the sliding window. Units depend on sliding window type. The size represents seconds. |
143+
| Threshold min number of failures | 1000 | Minimum number of failures before circuit breaker is tripped. |
144+
| Failure rate threshold | `10.0f` | Percentage of calls within the sliding window that must fail before the circuit breaker transitions to the `OPEN` state. |
149145
| Circuit breaker included exception list | [JedisConnectionException] | A list of Throwable classes that count as failures and add to the failure rate. |
150146
| Circuit breaker ignored exception list | null | A list of Throwable classes to explicitly ignore for failure rate calculations. | |
151147

pom.xml

Lines changed: 7 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -62,6 +62,8 @@
6262
<junit.version>5.13.4</junit.version>
6363
<!-- Default JVM options for tests -->
6464
<JVM_OPTS></JVM_OPTS>
65+
<!-- Default excluded groups for tests - can be overridden from command line -->
66+
<excludedGroupsForUnitTests>integration,scenario</excludedGroupsForUnitTests>
6567
</properties>
6668

6769
<dependencyManagement>
@@ -335,7 +337,7 @@
335337
<systemPropertyVariables>
336338
<redis-hosts>${redis-hosts}</redis-hosts>
337339
</systemPropertyVariables>
338-
<excludedGroups>integration,scenario</excludedGroups>
340+
<excludedGroups>${excludedGroupsForUnitTests}</excludedGroups>
339341
<excludes>
340342
<exclude>**/examples/*.java</exclude>
341343
<exclude>**/scenario/*Test.java</exclude>
@@ -482,21 +484,16 @@
482484
<include>**/Endpoint.java</include>
483485
<include>src/main/java/redis/clients/jedis/mcf/*.java</include>
484486
<include>src/test/java/redis/clients/jedis/failover/*.java</include>
485-
<include>**/mcf/EchoStrategyIntegrationTest.java</include>
486-
<include>**/mcf/LagAwareStrategyUnitTest.java</include>
487-
<include>**/mcf/RedisRestAPI*.java</include>
488-
<include>**/mcf/ActiveActiveLocalFailoverTest*</include>
489-
<include>**/mcf/FailbackMechanism*.java</include>
490-
<include>**/mcf/PeriodicFailbackTest*.java</include>
491-
<include>**/mcf/AutomaticFailoverTest*.java</include>
492-
<include>**/mcf/MultiCluster*.java</include>
493-
<include>**/mcf/StatusTracker*.java</include>
487+
<include>src/test/java/redis/clients/jedis/mcf/*.java</include>
494488
<include>**/Health*.java</include>
495489
<include>**/*IT.java</include>
496490
<include>**/scenario/RestEndpointUtil.java</include>
497491
<include>src/main/java/redis/clients/jedis/MultiClusterClientConfig.java</include>
498492
<include>src/main/java/redis/clients/jedis/HostAndPort.java</include>
499493
<include>**/builders/*.java</include>
494+
<include>**/MultiDb*.java</include>
495+
<include>**/ClientTestUtil.java</include>
496+
<include>**/ReflectionTestUtil.java</include>
500497
</includes>
501498
</configuration>
502499
<executions>

0 commit comments

Comments
 (0)