-
Notifications
You must be signed in to change notification settings - Fork 3.9k
[automatic failover] Add tests: EchoStrategy does recover on connection error #4230
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
ggivo
merged 2 commits into
redis:feature/automatic-failover
from
ggivo:feature/automatic-failover
Aug 15, 2025
Merged
[automatic failover] Add tests: EchoStrategy does recover on connection error #4230
ggivo
merged 2 commits into
redis:feature/automatic-failover
from
ggivo:feature/automatic-failover
Aug 15, 2025
+186
−3
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
d53cf7f
to
6a10cb2
Compare
6 tasks
6a10cb2
to
ee1021d
Compare
atakavci
approved these changes
Aug 15, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
ggivo
added a commit
that referenced
this pull request
Sep 1, 2025
…4228) * Setup test infra for automatic failover tests * Add toxyproxy * Add test endpoints * Add example tests * [automatic-failover] Enforce formatting for automatic failover codebase (#4176) * enforce formating for mcf * enforce formating for mcf * Trigger integration test jobs feature branches * [automatic failover] Improve failover resilience by evaluating circuit breaker before retries (#4178) * fix: Change evaluation order of circuit breaker and retry mechanisms This commit modifies the order in which circuit breaker and retry mechanisms are evaluated during failover scenarios. Previously, the circuit breaker was applied after all retries were exhausted, which could delay failover in unstable network conditions. Now, the circuit breaker counts each connection error separately, even during retry attempts. * format * format * use java-8 compatible formater version * fix AutomaticFailoverTest * formating * [automatic-failover] Implement failover retry for in-flight commands (#4175) * Implement failover retry for in-flight commands This change adds support for retrying in-flight commands when a Redis connection fails and the client automatically fails over to another cluster. The feature is configurable through the FailoverOptions builder. Key changes: - Added retryFailedInflightCommands option to FailoverOptions - Implemented retry logic in CircuitBreakerCommandExecutor - Added integration tests to verify both retry and no-retry behavior - Created utility methods for test setup and configuration This enhancement improves resilience for long-running commands like blocking operations, allowing them to transparently continue on the failover cluster without client-side errors. * Update src/main/java/redis/clients/jedis/mcf/CircuitBreakerCommandExecutor.java Co-authored-by: Copilot <[email protected]> # Conflicts: # src/main/java/redis/clients/jedis/mcf/CircuitBreakerCommandExecutor.java * format format & clean-up * fix FailoverIntegrationTest.testInflightCommandsAreNotRetriedAfterFailover blpop timeout should be less than async command timeout to prevent completing with java.util.concurrent.TimeoutException instead of actuall command failure * Address comments from review - rename retryFailedInflightCommands->retryOnFailover - remove check for CB in OPEN state * remove unused method --------- Co-authored-by: Copilot <[email protected]> * [automatic-failover] Fix formatting for anticipated AA Failover changes (#4183) * fix formatting for anticipated failover changes * additional formatting to cover given folders * [automatic failover] Implement HealtStatusManager + weighted endpoints (#4189) * - weighted cluster seleciton - Healtstatus manager with initial listener and registration logic - pluggable health checker strategy introduced, these are draft NoOpStrategy, EchoStrategy, LagAwareStrategy, - fix failing tests impacted from weighted clusters * - add builder for ClusterConfig - add echo ot CommandObjects and UnifiedJEdis - improve StrategySupplier by accepting jedisclientconfig - adapt EchoStrategy to StrategySupplier. Now it handles the creation of connection by accepting endpoint and JedisClientConfig - make healthchecks disabled by default - drop noOpStrategy - add unit&integration tests for health check * - fix naming * clean up and mark override methods * fix link in javadoc * fix formatting * - fix double registered listeners in healtstatusmgr - clear redundant catch - replace failover options and drop failoveroptions class - remove forced_unhealthy from healthstatus - fix failback check - add disabled flag to cluster - update/fix related tests * Update src/main/java/redis/clients/jedis/mcf/EchoStrategy.java Co-authored-by: Copilot <[email protected]> --------- Co-authored-by: Copilot <[email protected]> * [automatic failover] Implement add/remove endpoints (#4200) * - weighted cluster seleciton - Healtstatus manager with initial listener and registration logic - pluggable health checker strategy introduced, these are draft NoOpStrategy, EchoStrategy, LagAwareStrategy, - fix failing tests impacted from weighted clusters * - add builder for ClusterConfig - add echo ot CommandObjects and UnifiedJEdis - improve StrategySupplier by accepting jedisclientconfig - adapt EchoStrategy to StrategySupplier. Now it handles the creation of connection by accepting endpoint and JedisClientConfig - make healthchecks disabled by default - drop noOpStrategy - add unit&integration tests for health check * - fix naming * clean up and mark override methods * fix link in javadoc * fix formatting * - fix double registered listeners in healtstatusmgr - clear redundant catch - replace failover options and drop failoveroptions class - remove forced_unhealthy from healthstatus - fix failback check - add disabled flag to cluster - update/fix related tests * Update src/main/java/redis/clients/jedis/mcf/EchoStrategy.java Co-authored-by: Copilot <[email protected]> * - add remove endpoints * - replace cluster disabled with failbackCandidate - replace failback enabled with failbacksupported in client - fix formatting - set defaults * - remove failback candidate - fix failing tests * - fix remove logic - fix failing tests --------- Co-authored-by: Copilot <[email protected]> * [automatic failover] Implement failback with given grace period (#4204) * - weighted cluster seleciton - Healtstatus manager with initial listener and registration logic - pluggable health checker strategy introduced, these are draft NoOpStrategy, EchoStrategy, LagAwareStrategy, - fix failing tests impacted from weighted clusters * - add builder for ClusterConfig - add echo ot CommandObjects and UnifiedJEdis - improve StrategySupplier by accepting jedisclientconfig - adapt EchoStrategy to StrategySupplier. Now it handles the creation of connection by accepting endpoint and JedisClientConfig - make healthchecks disabled by default - drop noOpStrategy - add unit&integration tests for health check * - fix naming * clean up and mark override methods * fix link in javadoc * fix formatting * - fix double registered listeners in healtstatusmgr - clear redundant catch - replace failover options and drop failoveroptions class - remove forced_unhealthy from healthstatus - fix failback check - add disabled flag to cluster - update/fix related tests * Update src/main/java/redis/clients/jedis/mcf/EchoStrategy.java Co-authored-by: Copilot <[email protected]> * - add remove endpoints * - replace cluster disabled with failbackCandidate - replace failback enabled with failbacksupported in client - fix formatting - set defaults * - remove failback candidate - fix failing tests * - fix remove logic - fix failing tests * - periodic failback checks - introduce graceperiod - fix issue when CB is forced_open and gracePeriod is completed * - format * - fix timing issue --------- Co-authored-by: Copilot <[email protected]> * [automatic failover] Implement wait on healthCheck results during client init (#4207) * - weighted cluster seleciton - Healtstatus manager with initial listener and registration logic - pluggable health checker strategy introduced, these are draft NoOpStrategy, EchoStrategy, LagAwareStrategy, - fix failing tests impacted from weighted clusters * - add builder for ClusterConfig - add echo ot CommandObjects and UnifiedJEdis - improve StrategySupplier by accepting jedisclientconfig - adapt EchoStrategy to StrategySupplier. Now it handles the creation of connection by accepting endpoint and JedisClientConfig - make healthchecks disabled by default - drop noOpStrategy - add unit&integration tests for health check * - fix naming * clean up and mark override methods * fix link in javadoc * fix formatting * - fix double registered listeners in healtstatusmgr - clear redundant catch - replace failover options and drop failoveroptions class - remove forced_unhealthy from healthstatus - fix failback check - add disabled flag to cluster - update/fix related tests * Update src/main/java/redis/clients/jedis/mcf/EchoStrategy.java Co-authored-by: Copilot <[email protected]> * - add remove endpoints * - replace cluster disabled with failbackCandidate - replace failback enabled with failbacksupported in client - fix formatting - set defaults * - remove failback candidate - fix failing tests * - fix remove logic - fix failing tests * - periodic failback checks - introduce graceperiod - fix issue when CB is forced_open and gracePeriod is completed * - introduce StatusTracker with purpose of waiting initial healthcheck results during consturction of provider - add HealthStatus.UNKNOWN as default for Cluster - handle status changes in order of events during initialization - add tests for status tracker and orderingof events - fix impacted unit&integ tests * - introduce forceActiveCluster by duration - fix formatting * - fix failing tests by waiting on clusters to get healthy * - fix failing scenario test - downgrade logback version for slf4j compatibility - increase timeouts for faultInjector * - adressing reviews and feedback * - fix formatting * - fix formatting * - get rid of the queue and event ordering for healthstatus change in MultiClusterPooledConnectionProvider - add test for init and post init events - fix failing tests * - replace use of reflection with helper methods - fix failing tests due to method name change * - replace 'sleep' with 'await', feedback from @a-TODO-rov --------- Co-authored-by: Copilot <[email protected]> * [automatic failover] Introduce fast failover mode - a thread-sync-free approach with builders (#4226) * - weighted cluster seleciton - Healtstatus manager with initial listener and registration logic - pluggable health checker strategy introduced, these are draft NoOpStrategy, EchoStrategy, LagAwareStrategy, - fix failing tests impacted from weighted clusters * - add builder for ClusterConfig - add echo ot CommandObjects and UnifiedJEdis - improve StrategySupplier by accepting jedisclientconfig - adapt EchoStrategy to StrategySupplier. Now it handles the creation of connection by accepting endpoint and JedisClientConfig - make healthchecks disabled by default - drop noOpStrategy - add unit&integration tests for health check * - fix naming * clean up and mark override methods * fix link in javadoc * fix formatting * - fix double registered listeners in healtstatusmgr - clear redundant catch - replace failover options and drop failoveroptions class - remove forced_unhealthy from healthstatus - fix failback check - add disabled flag to cluster - update/fix related tests * Update src/main/java/redis/clients/jedis/mcf/EchoStrategy.java Co-authored-by: Copilot <[email protected]> * - add remove endpoints * - replace cluster disabled with failbackCandidate - replace failback enabled with failbacksupported in client - fix formatting - set defaults * - remove failback candidate - fix failing tests * - fix remove logic - fix failing tests * - periodic failback checks - introduce graceperiod - fix issue when CB is forced_open and gracePeriod is completed * - introduce StatusTracker with purpose of waiting initial healthcheck results during consturction of provider - add HealthStatus.UNKNOWN as default for Cluster - handle status changes in order of events during initialization - add tests for status tracker and orderingof events - fix impacted unit&integ tests * - introduce forceActiveCluster by duration - fix formatting * - fix failing tests by waiting on clusters to get healthy * - fix failing scenario test - downgrade logback version for slf4j compatibility - increase timeouts for faultInjector * - adressing reviews and feedback * - fix formatting * - fix formatting * - get rid of the queue and event ordering for healthstatus change in MultiClusterPooledConnectionProvider - add test for init and post init events - fix failing tests * - replace use of reflection with helper methods - fix failing tests due to method name change * - introduce clusterSwitchEvent and drop clusterFailover post processor - fix broken echostrategy due to connection issue - make healtthCheckStrategy closable and close on - adding fastfailover mode to config and provider - add local failover tests for total failover duration * - introduce fastfailover using objectMaker injection into connectionFactory * - polish * - cleanup * - improve healtcheck thread visibility * - introduce TrackingConnectionPool with FailFastConnectionFactory - added builders to connection and connectionFactory - introduce initializtionTracker to track list of connections during their construction. * - return broken source as usual - do not throw exception is failover already happening * - unblock waiting threads * - failover by closing the pool * - formatting * - check waiters and active/idle connections to force disconnect * - add builder to trackingconnectionpool * - fix failing tests due to mocked ctor for trackingConnectionPool * - replace initTracker with split initializtion of conn * - refactor on builders and ctors * - undo format * - clena up * - add exceptions to logs * - add max wait duration and minConsecutiveSuccessCount to healthCheckStrategy * - replace 'sleep' with 'await', feedback from @a-TODO-rov * - fix status tracker tests --------- Co-authored-by: Copilot <[email protected]> * [automatic failover] Add tests: EchoStrategy does recover on connection error (#4230) * Add tests: EchoStrategy does recover on connection error * log error's on exception * [automatic failover] Replace 'SimpleEntry' with 'HealthCheckResult' (#4236) * - replace SimpleEntry with HealthCheckResult * - format * [automatic failover] Add rate limiter to test MultiThreadedFakeApp (#4248) * Add option for rate limiter to MultiThreadedFakeApp * retrigger checks * [automatic failover] Implement lag-aware api (#4235) * - lag-aware api * - introduce healtCheckMetaConfig - add config objects for both EchoStrategy and LagAwareStrategy - make StrategySupplier interface generic - fix impacted tests * - fix issues with docs and log * Fix EchoStrategyIntegrationTest after merge * Implement resolve bdbId based on configured HostAndPort * format & merge RedisRestAPI unit tests * introduce builder for LagAwareStrategy * configurable extended-check - provide option to enable/disable lag check - default to lag-aware check with AVAILABILITY_LAG_TOLERANCE_DEFAULT of 100ms * - fix java doc issue * - formatting * -fix doc issue * - formatting * revert introduce healtCheckMetaConfig - introduces type unsafe warnings, revert till they are resovled * Add list constructor * Add option for rate limiter to MultiThreadedFakeApp * Use /v1/local/bdbs/<int:uid>/endpoint/availability - use /v1/local/bdbs/<int:uid>/endpoint/availability instead of availability address vs /v1/bdbs/<int:uid>/availability * Add builder factory method for base HealthCheckStrategy.Config * format * format * revert to /v1/bdbs/<int:uid>/availability reading again the spec /local is inteded to be used when configuring LoadBalancers * revert to /v1/bdbs/<int:uid>/availability reading again the spec /local is inteded to be used when configuring LoadBalancers * format * fix unit test after revert of local * format * Address review comments - remove misleading default value comment - renamed Config.endpoint to Config.restEndpoint - renamed dataPathAvailability -> databaseAvailability --------- Co-authored-by: ggivo <[email protected]> * [automatic failover] Refactor cluster to replace `HealthStatus` with `HealthCheck` itself (#4244) * - set sleep duration 1 ms for forceDisconnect * - refactor cluster to replace HealthStatus with HealthCheck itself * - move Endpoint to parent package * - format * - fix typo * - adjust tests with new provider behaviour with HealthCheck * Address review comments - rename getTimeout() -> getMaxWaitFor() - remove unnecessary noHealthChecks * resolve merge conflicts * format * format --------- Co-authored-by: Ivo Gaydazhiev <[email protected]> * [automatic failover] fix: health checks continue to run even after provider is closed. (#4252) * fix: health checks continue to run even after provider is closed. Closes #4251 * format * format & fix tests after merge format * [automatic failover] Improve AA failover scenario test (#4249) * Improve AA failover test * Fix endpoint name * Count failedCommandsAfterFailover * do not enforce formating on ActiveActiveFailoverTest --------- Co-authored-by: Ivo Gaydazhiev <[email protected]> * [automatic failover ]fix: only notify health status listeners on actual state changes #4255 (#4256) * fix: only notify health status listeners on actual state changes #4255 - Replace overlapping health checks with sequential execution - use wasUpdated flag to prevent false notifications from timestamp conflicts. Eliminates race conditions causing spurious failovers. Closes: #4255 * format * Update src/main/java/redis/clients/jedis/mcf/HealthCheckImpl.java Co-authored-by: atakavci <[email protected]> * revert to fixed rate health checks * fix ActiveActiveLocalFailoverTest - reduced endpoint pause to 10s to speed up test - ensure test endpoint is eavailable before startign the test (awaits up to endpoint pause time) --------- Co-authored-by: atakavci <[email protected]> * Connection.forceDisconnect should also set the connection as broken #4233 (#4258) Address: Connection.forceDisconnect should also set the connection as broken #4233 * [automatic failover] Update failover docs (#4257) * Initial draft * Cover health checks * Add javadoc's for MultiClusterClientConfig * Fix docs for health checks * Fix javadoc's * Update wordlist.txt --------- Co-authored-by: Ivo Gaydazhiev <[email protected]> * rever hbase-formatter.xml changes - ensure hbase-formatter.xml match current one in master * reformat to comply with hbase-formater rules in master * fix errors after merge - setClusterFailoverPostProcessor renamed - format AutomaticFailoverTest * format - Endpoint.java * format - ActiveActiveLocalFailoverTest - FailbackMechanismIntegrationTest - FailbackMechanismUnitTest - PeriodicFailbackTest - StatusTrackerTest format - ActiveActiveLocalFailoverTest.java * Add failover containers for testing to local test infra * [automatic failover] Address review comments (#4264) Address review comments - replace magic numbers in tests with constats --------- Co-authored-by: ggivo <[email protected]> Co-authored-by: Copilot <[email protected]> Co-authored-by: Igor Malinovskiy <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Add tests to ensure