-
Notifications
You must be signed in to change notification settings - Fork 3.9k
[automatic failover] Introduce fast failover mode #4220
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[automatic failover] Introduce fast failover mode #4220
Conversation
- Healtstatus manager with initial listener and registration logic - pluggable health checker strategy introduced, these are draft NoOpStrategy, EchoStrategy, LagAwareStrategy, - fix failing tests impacted from weighted clusters
- add echo ot CommandObjects and UnifiedJEdis - improve StrategySupplier by accepting jedisclientconfig - adapt EchoStrategy to StrategySupplier. Now it handles the creation of connection by accepting endpoint and JedisClientConfig - make healthchecks disabled by default - drop noOpStrategy - add unit&integration tests for health check
- clear redundant catch - replace failover options and drop failoveroptions class - remove forced_unhealthy from healthstatus - fix failback check - add disabled flag to cluster - update/fix related tests
Co-authored-by: Copilot <[email protected]>
- replace failback enabled with failbacksupported in client - fix formatting - set defaults
- fix failing tests
- fix failing tests
- introduce graceperiod - fix issue when CB is forced_open and gracePeriod is completed
… results during consturction of provider - add HealthStatus.UNKNOWN as default for Cluster - handle status changes in order of events during initialization - add tests for status tracker and orderingof events - fix impacted unit&integ tests
- fix formatting
- downgrade logback version for slf4j compatibility - increase timeouts for faultInjector
…MultiClusterPooledConnectionProvider - add test for init and post init events - fix failing tests
- fix failing tests due to method name change
- fix broken echostrategy due to connection issue - make healtthCheckStrategy closable and close on - adding fastfailover mode to config and provider - add local failover tests for total failover duration
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces a fast failover mode for the multi-cluster Redis client by replacing the simple index-based cluster selection with a sophisticated weight-based health checking system. It adds comprehensive health monitoring capabilities, automatic failback mechanisms, and configurable grace periods to improve cluster availability and reliability.
Key Changes
- Replaces index-based cluster management with weight-based endpoint selection using health checks
- Introduces comprehensive health monitoring with configurable strategies and periodic status checks
- Adds automatic failback mechanism with grace periods to prevent rapid switching between clusters
- Implements fast failover mode that can forcibly disconnect active connections for quicker switching
Reviewed Changes
Copilot reviewed 42 out of 43 changed files in this pull request and generated 3 comments.
Show a summary per file
File | Description |
---|---|
MultiClusterPooledConnectionProvider.java |
Core refactoring from index-based to endpoint-based cluster management with health monitoring integration |
TrackingConnectionPool.java |
New connection pool implementation with tracking and force disconnect capabilities for fast failover |
Health Check Infrastructure | Multiple new files implementing health status management, event-driven monitoring, and various health check strategies |
Test Files | Comprehensive test coverage for new health checking, failback mechanisms, and integration scenarios |
Configuration Updates | Updates to support new health check configuration, weights, grace periods, and failback settings |
Comments suppressed due to low confidence (4)
src/test/java/redis/clients/jedis/mcf/ActiveActiveLocalFailoverTest.java:144
- [nitpick] The class name 'FailoverReporter' should be 'ClusterSwitchReporter' to better reflect that it handles all cluster switch events, not just failover events.
class FailoverReporter implements Consumer<ClusterSwitchEventArgs> {
src/main/java/redis/clients/jedis/mcf/ClusterSwitchEventArgs.java:8
- Field name 'ClusterName' should follow Java naming conventions and be 'clusterName' (camelCase).
private final String ClusterName;
src/main/java/redis/clients/jedis/mcf/ClusterSwitchEventArgs.java:9
- Field name 'Endpoint' should follow Java naming conventions and be 'endpoint' (camelCase).
private final Endpoint Endpoint;
src/main/java/redis/clients/jedis/mcf/TrackingConnectionPool.java:47
- [nitpick] The method name 'injector' is unclear. Consider renaming to 'wrapConnectionSupplier' or 'createConnectionWrapper' to better describe its purpose.
private Supplier<Connection> injector(Supplier<Connection> supplier) {
src/main/java/redis/clients/jedis/providers/MultiClusterPooledConnectionProvider.java
Show resolved
Hide resolved
src/test/java/redis/clients/jedis/mcf/ActiveActiveLocalFailoverTest.java
Outdated
Show resolved
Hide resolved
Closing this in favor of #4226. This has been an idea to "failover immediately" while trying to avoid any changes on core components in lib. While this is still valid and viable option, we also start tinkering around a way to enable the core components for more flexibility and decided to put more effort and courage to do so. #4226 presents the way we chose to proceed for a "fast failover". |
Closing this in favor of #4226.
This has been an idea to "failover immediately" while trying to avoid any changes on core components in lib. While this is still valid and viable option, we also start tinkering around a way to enable the core components for more flexibility and decided to put more effort and courage to do so. #4226 presents the way we chose to proceed for a "fast failover".
This PR is based on changes in previous #4207.
Summary of the changes in PR;
Commits essential to this one are;
- introduce clusterSwitchEvent and drop clusterFailover post processor
- introduce fastfailover using objectMaker injection into connectionF…
- polish