From ccc22eba15d425d31b31c327339dfac9a6ce6483 Mon Sep 17 00:00:00 2001 From: vladvildanov Date: Wed, 20 Aug 2025 12:00:35 +0300 Subject: [PATCH 01/14] Added Active-Active documentation page --- docs/active_active.rst | 223 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 223 insertions(+) create mode 100644 docs/active_active.rst diff --git a/docs/active_active.rst b/docs/active_active.rst new file mode 100644 index 0000000000..33b990930c --- /dev/null +++ b/docs/active_active.rst @@ -0,0 +1,223 @@ +Active-Active +============= + +MultiDBClient explanation +-------------------------- + +Starting from redis-py 6.5.0 we introduce a new type of client to communicate +with databases in Active-Active setup. `MultiDBClient` is a wrapper around multiple +Redis or Redis Cluster clients, each of them has 1:1 relation to specific +database. `MultiDBClient` in most of the cases provides the same API as any other +client for the best user experience. + +The core feature of `MultiDBClient` is automaticaly triggered failover depends on the +database healthiness. The pre-condition is that each database that is configured +to be used by MultiDBClient are eventually consistent, so client could choose +any database in any point of time for communication. `MultiDBClient` always communicates +with single database, so there's 1 active and N passive databases that acting as a +stand-by replica. By default, active database is choosed based on the weights that +has to be assigned for each database. + +We have two mechanisms to verify database healthiness: `Healthcheck` and +`Failure Detector`. + +The very basic configuration you need to setup a `MultiDBClient`: + +.. code:: python + + // Expected active database (highest weight) + database1_config = DatabaseConfig( + weight=1.0, + from_url="redis://host1:port1", + client_kwargs={ + 'username': "username", + 'password': "password", + } + ) + + // Passive database (stand-by replica) + database2_config = DatabaseConfig( + weight=0.9, + from_url="redis://host2:port2", + client_kwargs={ + 'username': "username", + 'password': "password", + } + ) + + config = MultiDbConfig( + databases_config=[database1_config, database2_config], + ) + + client = MultiDBClient(config) + + +Healthcheck +----------- + +By default, we're using healthcheck based on `ECHO` command to verify that database is +reachable and ready to serve requests (`PING` guarantees first, but not the second). +Additionaly, you can add your own healthcheck implementation and extend a list of +healthecks + +All healthchecks are running in the background with given interval and configuration +defined in `MultiDBConfig` class. + + +Failure Detector +---------------- + +Unlike healthcheck, `Failure Detector` verifies database healthiness based on organic +trafic, so the default one reacts to any command failures within a sliding window of +seconds and mark database as unhealthy if threshold has been exceeded. You can extend +a list of failure detectors providing your own implementation, configuration defined +in `MultiDBConfig` class. + + +Databases configuration +----------------------- + +You have to provide a configuration for each database in setup separately, using +`DatabaseConfig` class per database. As mentioned, there's an undelying instance +of `Redis` or `RedisCluster` client for each database, so you can pass all the +arguments related to them via `client_kwargs` argument. + +.. code:: python + + database_config = DatabaseConfig( + weight=1.0, + client_kwargs={ + 'host': 'localhost', + 'port': 6379, + 'username': "username", + 'password': "password", + } + ) + +It also supports `from_url` or `from_pool` capabilites to setup a client using +Redis URL or custom `ConnectionPool` object. + +.. code:: python + + database_config1 = DatabaseConfig( + weight=1.0, + from_url="redis://host1:port1", + client_kwargs={ + 'username': "username", + 'password': "password", + } + ) + + database_config2 = DatabaseConfig( + weight=0.9, + from_pool=connection_pool, + ) + +The only exception from `client_kwargs` is the retry configuration. We do not allow +to pass underlying `Retry` object to avoid nesting retries. All the retries are +controlled by top-level `Retry` object that you can setup via `command_retry` +argument (check `MultiDBConfig`) + + +Pipeline +-------- + +`MultiDBClient` supports pipeline mode with guaranteed pipeline retry in case +of failover. Unlike, the `Redis` and `RedisCluster` clients you cannot +execute transactions via pipeline mode, only via `transaction` method +on `MultiDBClient`. This was done for better retries handling in case +of failover. + +The overall interface for pipeline execution is the same, you can +pipeline commands using chaining calls or context manager. + +.. code:: python + + // Chaining + client = MultiDBClient(config) + pipe = client.pipeline() + pipe.set('key1', 'value1') + pipe.get('key1') + pipe.execute() // ['OK', 'value1'] + + // Context manager + client = MultiDBClient(config) + with client.pipeline() as pipe: + pipe.set('key1', 'value1') + pipe.get('key1') + pipe.execute() // ['OK', 'value1'] + + +Transaction +----------- + +`MultiDBClient` supports transaction execution via `transaction()` method +with guaranteed transaction retry in case of failover. Like any other +client it accepts a callback with underlying `Pipeline` object to build +your transaction for atomic execution + +CAS behaviour supported as well, so you can provide a list of keys to track. + +.. code:: python + + client = MultiDBClient(config) + + def callback(pipe: Pipeline): + pipe.set('key1', 'value1') + pipe.get('key1') + + client.transaction(callback, 'key1') // ['OK1', 'value1'] + + +Pub/Sub +------- + +`MultiDBClient` supports Pub/Sub mode with guaranteed re-subscription +to the same channels in case of failover. So the expectation is that +both publisher and subscriber are using `MultiDBClient` instance to +provide seamless experience in terms of failover. + +1. Subscriber failover to another database and re-subscribe to the same +channels. + +2. Publisher failover to another database and starts publishing +messages to the same channels. + +However, it's still possible to lose messages if order of failover +will be reversed. + +Like the other clients, there's two main methods to consume messages: +in the main thread and in the separate thread + +.. code:: python + + client = MultiDBClient(config) + p = client.pubsub() + + // In the main thread + while True: + message = p.get_message() + if message: + // do something with the message + time.sleep(0.001) + + +.. code:: python + + // In separate thread + client = MultiDBClient(config) + p = client.pubsub() + messages_count = 0 + data = json.dumps({'message': 'test'}) + + def handler(message): + nonlocal messages_count + messages_count += 1 + + // Assign a handler and run in a separate thread. + p.ssubscribe(**{'test-channel': handler}) + pubsub_thread = pubsub.run_in_thread(sleep_time=0.1, daemon=True) + + for _ in range(10): + client.publish('test-channel', data) + sleep(0.1) From 8e49c9a5fad48ef7d9a838e277de753b2d4244e3 Mon Sep 17 00:00:00 2001 From: vladvildanov Date: Tue, 26 Aug 2025 15:15:30 +0300 Subject: [PATCH 02/14] Added documentation for Active-Active --- docs/active_active.rst | 162 ++++++++++++++++++++++++++++++++++++++++- 1 file changed, 161 insertions(+), 1 deletion(-) diff --git a/docs/active_active.rst b/docs/active_active.rst index 33b990930c..47b7f6cea1 100644 --- a/docs/active_active.rst +++ b/docs/active_active.rst @@ -21,6 +21,12 @@ has to be assigned for each database. We have two mechanisms to verify database healthiness: `Healthcheck` and `Failure Detector`. +To be able to use `MultiDBClient` you need to install a `pybreaker` package: + +.. code:: python + + pip install pybreaker>=1.4.0 + The very basic configuration you need to setup a `MultiDBClient`: .. code:: python @@ -63,6 +69,71 @@ healthecks All healthchecks are running in the background with given interval and configuration defined in `MultiDBConfig` class. +Lag-Aware Healthcheck +~~~~~~~~~~~~~~~~~~~~~ + +This is a special type of healthcheck available for Redis Software and Redis Cloud +that utilizes REST API endpoint to obtain an information about synchronisation lag +between given database and all other databases in Active-Active setup. + +To be able to use this type of healthcheck, first you need to adjust your +`DatabaseConfig` to expose `health_check_url` used by your deployment. +By default, your Cluster FQDN should be used as URL, unless you have +some kind of reverse proxy behind an actual REST API endpoint. + +.. code:: python + + database1_config = DatabaseConfig( + weight=1.0, + from_url="redis://host1:port1", + health_check_url="https://c1.deployment-name-000000.cto.redislabs.com" + client_kwargs={ + 'username': "username", + 'password': "password", + } + ) + +Since, Lag-Aware Healthcheck only available for Redis Software and Redis Cloud +it's not in the list of the default healthchecks for `MultiDBClient`. You have +to provide it manually during client configuration or in runtime. + +.. code:: python + + // Configuration option + config = MultiDbConfig( + databases_config=[database1_config, database2_config], + health_checks=[ + LagAwareHealthCheck(auth_basic=('username','password'), verify_tls=False) + ] + ) + + client = MultiDBClient(config) + +.. code:: python + + // In runtime + client = MultiDBClient(config) + client.add_health_check( + LagAwareHealthCheck(auth_basic=('username','password'), verify_tls=False) + ) + +As mentioned we utilise REST API endpoint for Lag-Aware healthchecks, so it accepts +different type of HTTP-related configuration: authentication credentials, request +timeout, TLS related configuration, etc. (check `LagAwareHealthCheck` class). + +You can also specify `lag_aware_tolerance` parameter to specify the tolerance in MS +of lag between databases that your application could tolerate. + +.. code:: python + + LagAwareHealthCheck( + rest_api_port=9443, + auth_basic=('username','password'), + lag_aware_tolerance=150, + verify_tls=True, + ca_file="path/to/file" + ) + Failure Detector ---------------- @@ -215,9 +286,98 @@ in the main thread and in the separate thread messages_count += 1 // Assign a handler and run in a separate thread. - p.ssubscribe(**{'test-channel': handler}) + p.subscribe(**{'test-channel': handler}) pubsub_thread = pubsub.run_in_thread(sleep_time=0.1, daemon=True) for _ in range(10): client.publish('test-channel', data) sleep(0.1) + + +OSS Cluster API support +----------------------- + +As mentioned `MultiDBClient` also supports integration with OSS Cluster API +databases. If you're instantiating client using Redis URL, the only change +you need comparing to standalone client is the `client_class` argument. +DNS server will resolve given URL and will point you to one of the node that +could be used to discover overall cluster topology. + +.. code:: python + + config = MultiDbConfig( + client_class=RedisCluster, + databases_config=[database1_config, database2_config], + ) + +If you would like to specify the exact node to use for topology +discovery, you can specify it the same way `RedisCluster` does + +.. code:: python + + // Expected active database (highest weight) + database1_config = DatabaseConfig( + weight=1.0, + client_kwargs={ + 'username': "username", + 'password': "password", + 'startup_nodes': [ClusterNode('host1', 'port1')], + } + ) + + // Passive database (stand-by replica) + database2_config = DatabaseConfig( + weight=0.9, + client_kwargs={ + 'username': "username", + 'password': "password", + 'startup_nodes': [ClusterNode('host2', 'port2')], + } + ) + + config = MultiDbConfig( + client_class=RedisCluster, + databases_config=[database1_config, database2_config], + ) + +Sharded Pub/Sub +~~~~~~~~~~~~~~~ + +If you would like to use a Sharded Pub/Sub capabilities make sure to use +correct Pub/Sub configuration. + +.. code:: python + + client = MultiDBClient(config) + p = client.pubsub() + + // In the main thread + while True: + // Reads messaage from sharded channels. + message = p.get_sharded_message() + if message: + // do something with the message + time.sleep(0.001) + + +.. code:: python + + // In separate thread + client = MultiDBClient(config) + p = client.pubsub() + messages_count = 0 + data = json.dumps({'message': 'test'}) + + def handler(message): + nonlocal messages_count + messages_count += 1 + + // Assign a handler and run in a separate thread. + p.ssubscribe(**{'test-channel': handler}) + + // Proactively executes get_sharded_pubsub() method + pubsub_thread = pubsub.run_in_thread(sleep_time=0.1, daemon=True, sharded_pubsub=True) + + for _ in range(10): + client.spublish('test-channel', data) + sleep(0.1) \ No newline at end of file From d83a1eaab87320020a1190603a1384e4100c9e68 Mon Sep 17 00:00:00 2001 From: vladvildanov Date: Mon, 1 Sep 2025 11:07:06 +0300 Subject: [PATCH 03/14] Refactored docs --- .../{active_active.rst => multi_database.rst} | 145 +++++++++++------- 1 file changed, 89 insertions(+), 56 deletions(-) rename docs/{active_active.rst => multi_database.rst} (66%) diff --git a/docs/active_active.rst b/docs/multi_database.rst similarity index 66% rename from docs/active_active.rst rename to docs/multi_database.rst index 47b7f6cea1..92145c3223 100644 --- a/docs/active_active.rst +++ b/docs/multi_database.rst @@ -1,22 +1,23 @@ -Active-Active -============= +Multi-Database Management +========================= MultiDBClient explanation -------------------------- -Starting from redis-py 6.5.0 we introduce a new type of client to communicate -with databases in Active-Active setup. `MultiDBClient` is a wrapper around multiple -Redis or Redis Cluster clients, each of them has 1:1 relation to specific -database. `MultiDBClient` in most of the cases provides the same API as any other -client for the best user experience. - -The core feature of `MultiDBClient` is automaticaly triggered failover depends on the -database healthiness. The pre-condition is that each database that is configured -to be used by MultiDBClient are eventually consistent, so client could choose -any database in any point of time for communication. `MultiDBClient` always communicates -with single database, so there's 1 active and N passive databases that acting as a -stand-by replica. By default, active database is choosed based on the weights that -has to be assigned for each database. +The `MultiDBClient` (introduced in version 6.5.0) manages connections to multiple +Redis databases and provides automatic failover when one database becomes unavailable. +Think of it as a smart load balancer that automatically switches to a healthy database +when your primary one goes down, ensuring your application stays online. +`MultiDBClient` in most of the cases provides the same API as any other client for +the best user experience. + +The core feature of MultiDBClient is its ability to automatically trigger failover +when an active database becomes unhealthy.The pre-condition is that all databases +that are configured to be used by `MultiDBClient` are eventually consistent, so client +could choose any database in any point in time for communication. `MultiDBClient` +always communicates with single database, so there's 1 active and N passive +databases that are acting as a stand-by replica. By default, active database is +chosen based on the weights that have to be assigned for each database. We have two mechanisms to verify database healthiness: `Healthcheck` and `Failure Detector`. @@ -58,42 +59,70 @@ The very basic configuration you need to setup a `MultiDBClient`: client = MultiDBClient(config) -Healthcheck ------------ +Health Monitoring +----------------- +The `MultiDBClient` uses two complementary mechanisms to ensure database availability: + +Health Checks (Proactive Monitoring) +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -By default, we're using healthcheck based on `ECHO` command to verify that database is -reachable and ready to serve requests (`PING` guarantees first, but not the second). -Additionaly, you can add your own healthcheck implementation and extend a list of -healthecks +These checks run continuously in the background at configured intervals to proactively +detect database issues. They run in the background with a given interval and +configuration defined in the `MultiDBConfig` class. -All healthchecks are running in the background with given interval and configuration -defined in `MultiDBConfig` class. +By default, MultiDBClient sends ECHO commands to verify each database is healthy. -Lag-Aware Healthcheck +**Custom Health Checks** +~~~~~~~~~~~~~~~~~~~~~ +You can add custom health checks for specific requirements: + +.. code:: python + + from redis.multidb.healthcheck import AbstractHealthCheck + from redis.retry import Retry + from redis.utils import dummy_fail + + + class PingHealthCheck(AbstractHealthCheck): + def __init__(self, retry: Retry): + super().__init__(retry=retry) + + def check_health(self, database) -> bool: + return self._retry.call_with_retry( + lambda: self._returns_pong(database), + lambda _: dummy_fail() + ) + + def _returns_pong(self, database) -> bool: + expected_message = ["PONG", b"PONG"] + actual_message = database.client.execute_command("PING") + return actual_message in expected_message + +**Lag-Aware Healthcheck (Redis Enterprise Only)** ~~~~~~~~~~~~~~~~~~~~~ This is a special type of healthcheck available for Redis Software and Redis Cloud -that utilizes REST API endpoint to obtain an information about synchronisation lag -between given database and all other databases in Active-Active setup. +that utilizes a REST API endpoint to obtain information about the synchronisation +lag between a given database and all other databases in an Active-Active setup. -To be able to use this type of healthcheck, first you need to adjust your -`DatabaseConfig` to expose `health_check_url` used by your deployment. -By default, your Cluster FQDN should be used as URL, unless you have -some kind of reverse proxy behind an actual REST API endpoint. +To use this healthcheck, first you need to adjust your `DatabaseConfig` +to expose `health_check_url` used by your deployment. By default, your +Cluster FQDN should be used as URL, unless you have some kind of +reverse proxy behind an actual REST API endpoint. .. code:: python database1_config = DatabaseConfig( weight=1.0, from_url="redis://host1:port1", - health_check_url="https://c1.deployment-name-000000.cto.redislabs.com" + health_check_url="https://c1.deployment-name-000000.project.env.com" client_kwargs={ 'username': "username", 'password': "password", } ) -Since, Lag-Aware Healthcheck only available for Redis Software and Redis Cloud +Since, Lag-Aware Healthcheck is only available for Redis Software and Redis Cloud it's not in the list of the default healthchecks for `MultiDBClient`. You have to provide it manually during client configuration or in runtime. @@ -135,23 +164,26 @@ of lag between databases that your application could tolerate. ) -Failure Detector ----------------- +Failure Detection (Reactive Monitoring) +~~~~~~~~~~~~~~~~~~~~~ -Unlike healthcheck, `Failure Detector` verifies database healthiness based on organic -trafic, so the default one reacts to any command failures within a sliding window of -seconds and mark database as unhealthy if threshold has been exceeded. You can extend -a list of failure detectors providing your own implementation, configuration defined -in `MultiDBConfig` class. +The failure detector watches actual command failures and marks databases as unhealthy +when error rates exceed thresholds within a sliding time window of a few seconds. +This catches issues that proactive health checks might miss during real traffic. +You can extend the list of failure detectors by providing your own implementation, +configuration defined in the `MultiDBConfig` class. Databases configuration ----------------------- -You have to provide a configuration for each database in setup separately, using -`DatabaseConfig` class per database. As mentioned, there's an undelying instance -of `Redis` or `RedisCluster` client for each database, so you can pass all the -arguments related to them via `client_kwargs` argument. +Each database needs a `DatabaseConfig` that specifies how to connect. + +Method 1: Using client_kwargs (most flexible) +~~~~~~~~~~~~~~~~~~~~~ + +There's an underlying instance of `Redis` or `RedisCluster` client for each database, +so you can pass all the arguments related to them via `client_kwargs` argument: .. code:: python @@ -165,11 +197,10 @@ arguments related to them via `client_kwargs` argument. } ) -It also supports `from_url` or `from_pool` capabilites to setup a client using -Redis URL or custom `ConnectionPool` object. - -.. code:: python +Method 2: Using Redis URL +~~~~~~~~~~~~~~~~~~~~~~~~~ +```python database_config1 = DatabaseConfig( weight=1.0, from_url="redis://host1:port1", @@ -179,15 +210,17 @@ Redis URL or custom `ConnectionPool` object. } ) - database_config2 = DatabaseConfig( - weight=0.9, - from_pool=connection_pool, - ) +Method 3: Using Custom Connection Pool +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +```python + database_config2 = DatabaseConfig( + weight=0.9, + from_pool=connection_pool, + ) -The only exception from `client_kwargs` is the retry configuration. We do not allow -to pass underlying `Retry` object to avoid nesting retries. All the retries are -controlled by top-level `Retry` object that you can setup via `command_retry` -argument (check `MultiDBConfig`) +**Important**: Don't pass `Retry` objects in `client_kwargs`. `MultiDBClient` +handles all retries at the top level through the `command_retry` configuration. Pipeline @@ -300,7 +333,7 @@ OSS Cluster API support As mentioned `MultiDBClient` also supports integration with OSS Cluster API databases. If you're instantiating client using Redis URL, the only change you need comparing to standalone client is the `client_class` argument. -DNS server will resolve given URL and will point you to one of the node that +DNS server will resolve given URL and will point you to one of the nodes that could be used to discover overall cluster topology. .. code:: python From 2ab08b2128802d68eda68d210145bdf9e3538c84 Mon Sep 17 00:00:00 2001 From: vladvildanov Date: Mon, 1 Sep 2025 11:54:19 +0300 Subject: [PATCH 04/14] Refactored pipeline and transaction section --- docs/multi_database.rst | 69 +++++++++++++++++++++++------------------ 1 file changed, 39 insertions(+), 30 deletions(-) diff --git a/docs/multi_database.rst b/docs/multi_database.rst index 92145c3223..eb80703831 100644 --- a/docs/multi_database.rst +++ b/docs/multi_database.rst @@ -200,7 +200,8 @@ so you can pass all the arguments related to them via `client_kwargs` argument: Method 2: Using Redis URL ~~~~~~~~~~~~~~~~~~~~~~~~~ -```python +.. code:: python + database_config1 = DatabaseConfig( weight=1.0, from_url="redis://host1:port1", @@ -213,7 +214,8 @@ Method 2: Using Redis URL Method 3: Using Custom Connection Pool ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -```python +.. code:: python + database_config2 = DatabaseConfig( weight=0.9, from_pool=connection_pool, @@ -223,28 +225,32 @@ Method 3: Using Custom Connection Pool handles all retries at the top level through the `command_retry` configuration. -Pipeline --------- +Pipeline Operations +------------------- + +The `MultiDBClient` supports pipeline mode with guaranteed retry functionality during +failover scenarios. Unlike standard `Redis` and `RedisCluster` clients, transactions +cannot be executed through pipeline mode - use the dedicated `transaction()` method +instead. This design choice ensures better retry handling during failover events. -`MultiDBClient` supports pipeline mode with guaranteed pipeline retry in case -of failover. Unlike, the `Redis` and `RedisCluster` clients you cannot -execute transactions via pipeline mode, only via `transaction` method -on `MultiDBClient`. This was done for better retries handling in case -of failover. +Pipeline operations support both chaining calls and context manager patterns: -The overall interface for pipeline execution is the same, you can -pipeline commands using chaining calls or context manager. +Chaining approach +~~~~~~~~~~~~~~~~~ .. code:: python - // Chaining client = MultiDBClient(config) pipe = client.pipeline() pipe.set('key1', 'value1') pipe.get('key1') pipe.execute() // ['OK', 'value1'] - // Context manager +Context Manager Approach +~~~~~~~~~~~~~~~~~~~~~~~~ + +.. code:: python + client = MultiDBClient(config) with client.pipeline() as pipe: pipe.set('key1', 'value1') @@ -255,12 +261,13 @@ pipeline commands using chaining calls or context manager. Transaction ----------- -`MultiDBClient` supports transaction execution via `transaction()` method -with guaranteed transaction retry in case of failover. Like any other -client it accepts a callback with underlying `Pipeline` object to build -your transaction for atomic execution +The `MultiDBClient` provides transaction support through the `transaction()` +method with guaranteed retry capabilities during failover. Like other +`Redis` clients, it accepts a callback function that receives a `Pipeline` +object for building atomic operations. -CAS behaviour supported as well, so you can provide a list of keys to track. +CAS behavior is fully supported by providing a list of +keys to monitor: .. code:: python @@ -276,22 +283,21 @@ CAS behaviour supported as well, so you can provide a list of keys to track. Pub/Sub ------- -`MultiDBClient` supports Pub/Sub mode with guaranteed re-subscription -to the same channels in case of failover. So the expectation is that -both publisher and subscriber are using `MultiDBClient` instance to -provide seamless experience in terms of failover. +The MultiDBClient offers Pub/Sub functionality with automatic re-subscription +to channels during failover events. For optimal failover handling, +both publishers and subscribers should use MultiDBClient instances. -1. Subscriber failover to another database and re-subscribe to the same -channels. +1. **Subscriber failover**: Automatically reconnects to an alternative database +and re-subscribes to the same channels -2. Publisher failover to another database and starts publishing -messages to the same channels. +2. **Publisher failover**: Seamlessly switches to an alternative database and +continues publishing to the same channels -However, it's still possible to lose messages if order of failover -will be reversed. +**Note**: Message loss may occur if failover events happen in reverse order +(publisher fails before subscriber). -Like the other clients, there's two main methods to consume messages: -in the main thread and in the separate thread +Main Thread Message Processing +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code:: python @@ -306,6 +312,9 @@ in the main thread and in the separate thread time.sleep(0.001) +Background Thread Processing +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + .. code:: python // In separate thread From 881580064c2513e4581753327e4dddc71365a4f0 Mon Sep 17 00:00:00 2001 From: vladvildanov Date: Wed, 17 Sep 2025 12:00:45 +0300 Subject: [PATCH 05/14] Updated docs --- .github/wordlist.txt | 10 +++++++ docs/multi_database.rst | 63 +++++++++++++++++++++++++++++++++++++++-- 2 files changed, 70 insertions(+), 3 deletions(-) diff --git a/.github/wordlist.txt b/.github/wordlist.txt index 150f96a624..48ea9e8737 100644 --- a/.github/wordlist.txt +++ b/.github/wordlist.txt @@ -1,6 +1,7 @@ APM ARGV BFCommands +balancer CacheImpl CAS CFCommands @@ -10,10 +11,17 @@ ClusterNodes ClusterPipeline ClusterPubSub ConnectionPool +config CoreCommands +DatabaseConfig +DNS EVAL EVALSHA +failover +FQDN Grokzen's +Healthcheck +healthchecks INCR IOError Instrumentations @@ -21,6 +29,7 @@ JSONCommands Jaeger Ludovico Magnocavallo +MultiDBClient McCurdy NOSCRIPT NUMPAT @@ -52,6 +61,7 @@ SpanKind Specfiying StatusCode TCP +TLS TOPKCommands TimeSeriesCommands Uptrace diff --git a/docs/multi_database.rst b/docs/multi_database.rst index eb80703831..abca4493d4 100644 --- a/docs/multi_database.rst +++ b/docs/multi_database.rst @@ -61,6 +61,13 @@ The very basic configuration you need to setup a `MultiDBClient`: Health Monitoring ----------------- +To avoid false positives, you can configure amount of health check probes and also +define one of the health check policies to evaluate probes result. + +**HealthCheckPolicies.HEALTHY_ALL** - (default) All probes should be successful +**HealthCheckPolicies.HEALTHY_MAJORITY** - Majority of probes should be successful +**HealthCheckPolicies.HEALTHY_ANY** - Any of probes should be successful + The `MultiDBClient` uses two complementary mechanisms to ensure database availability: Health Checks (Proactive Monitoring) @@ -102,7 +109,7 @@ You can add custom health checks for specific requirements: ~~~~~~~~~~~~~~~~~~~~~ This is a special type of healthcheck available for Redis Software and Redis Cloud -that utilizes a REST API endpoint to obtain information about the synchronisation +that utilizes a REST API endpoint to obtain information about the synchronization lag between a given database and all other databases in an Active-Active setup. To use this healthcheck, first you need to adjust your `DatabaseConfig` @@ -146,7 +153,7 @@ to provide it manually during client configuration or in runtime. LagAwareHealthCheck(auth_basic=('username','password'), verify_tls=False) ) -As mentioned we utilise REST API endpoint for Lag-Aware healthchecks, so it accepts +As mentioned we utilize REST API endpoint for Lag-Aware healthchecks, so it accepts different type of HTTP-related configuration: authentication credentials, request timeout, TLS related configuration, etc. (check `LagAwareHealthCheck` class). @@ -174,6 +181,40 @@ You can extend the list of failure detectors by providing your own implementatio configuration defined in the `MultiDBConfig` class. +Failover strategy +~~~~~~~~~~~~~~~~~ + +This component is responsible for failover when active database becomes unavailable. +By default, we're using `WeightBasedFailoverStrategy` to pick a database with the +highest weight to failover. You can provide your own strategy if you would like +to have your custom mechanism of failover. + +.. code:: python + + class CustomFailoverStrategy(FailoverStrategy): + def __init__(self): + self._databases: Databases = None + + def database(self) -> SyncDatabase: + for database, _ in self._databases: + random_int = random.randint(0, 1) + + if random_int == 1 and database.circuit.state == State.CLOSED: + return database + + // Exception should be raised if theres no suitable databases for failover + raise NoValidDatabaseException("No available database for failover") + +In case if there's no available databases for failover, we raise `TemporaryUnavailableException`. +This exception signals that you can still trying to send requests until final +`NoValidDatabaseException` will be thrown. The window for requests is configurable +and depends on two parameters `failover_attempts` and `failover_delay`. By default, +`failover_attempts=10` and `failover_delay=12s`, which means that you can still send requests +for 10*12 = 120 seconds until final exception will be thrown. In meanwhile, you can switch to +another data source (cache) and if healthy database will apears you can switch back making +this transparent to the end user. + + Databases configuration ----------------------- @@ -422,4 +463,20 @@ correct Pub/Sub configuration. for _ in range(10): client.spublish('test-channel', data) - sleep(0.1) \ No newline at end of file + sleep(0.1) + +Async implementation +-------------------- + +`MultiDBClient` is available with async API, which looks exactly as it's sync +analogue. The core difference is that it fully relies on `EventLoop` instead of +`threading` module. + +Async client comes with async context manager support and is recommended for +graceful task cancelling. + +.. code:: python + + async with MultiDBClient(client_config) as client: + await client.set('key', 'value') + return await client.get('key') \ No newline at end of file From 4cb18ef80c625925848a34eae18100921c0d5b3f Mon Sep 17 00:00:00 2001 From: vladvildanov Date: Wed, 17 Sep 2025 12:03:38 +0300 Subject: [PATCH 06/14] Extended list of words --- .github/wordlist.txt | 3 +++ 1 file changed, 3 insertions(+) diff --git a/.github/wordlist.txt b/.github/wordlist.txt index 48ea9e8737..ab209d34be 100644 --- a/.github/wordlist.txt +++ b/.github/wordlist.txt @@ -3,6 +3,7 @@ ARGV BFCommands balancer CacheImpl +cancelling CAS CFCommands CMSCommands @@ -21,6 +22,8 @@ failover FQDN Grokzen's Healthcheck +HealthCheckPolicies +healthcheck healthchecks INCR IOError From fca0e4f45387beddbce0167f096d4733ca6ad7af Mon Sep 17 00:00:00 2001 From: vladvildanov Date: Wed, 24 Sep 2025 11:41:24 +0300 Subject: [PATCH 07/14] Re-write documentation --- docs/multi_database.rst | 719 ++++++++++++++++++++-------------------- 1 file changed, 356 insertions(+), 363 deletions(-) diff --git a/docs/multi_database.rst b/docs/multi_database.rst index abca4493d4..c044ee40f4 100644 --- a/docs/multi_database.rst +++ b/docs/multi_database.rst @@ -1,233 +1,217 @@ -Multi-Database Management -========================= +Multi-database client (Active-Active) +===================================== -MultiDBClient explanation --------------------------- +The multi-database client lets you connect your application to multiple logical Redis databases at once +and operate them as a single, resilient endpoint. It continuously monitors health, detects failures, +and fails over to the next healthy database using a configurable strategy. When the previous primary +becomes healthy again, the client can automatically fall back to it. -The `MultiDBClient` (introduced in version 6.5.0) manages connections to multiple -Redis databases and provides automatic failover when one database becomes unavailable. -Think of it as a smart load balancer that automatically switches to a healthy database -when your primary one goes down, ensuring your application stays online. -`MultiDBClient` in most of the cases provides the same API as any other client for -the best user experience. +Key concepts +------------ -The core feature of MultiDBClient is its ability to automatically trigger failover -when an active database becomes unhealthy.The pre-condition is that all databases -that are configured to be used by `MultiDBClient` are eventually consistent, so client -could choose any database in any point in time for communication. `MultiDBClient` -always communicates with single database, so there's 1 active and N passive -databases that are acting as a stand-by replica. By default, active database is -chosen based on the weights that have to be assigned for each database. +- Database and weight: + Each database has a weight indicating its priority. The failover strategy chooses the highest-weight + healthy database as the active one. -We have two mechanisms to verify database healthiness: `Healthcheck` and -`Failure Detector`. +- Circuit breaker: + Each database is guarded by a circuit breaker with states CLOSED (healthy), OPEN (unhealthy), + and HALF_OPEN (probing). Health checks toggle these states to avoid hammering a downed database. -To be able to use `MultiDBClient` you need to install a `pybreaker` package: +- Health checks: + A set of checks determines whether a database is healthy. By default, an "ECHO" check runs against + the database (all cluster nodes must pass for a cluster). You can add custom checks. A Redis Enterprise + specific "lag-aware" health check is also available. -.. code:: python +- Failure detector: + A detector observes command failures over a moving window. You can specify an exact number of failures + and failures rate to have more fine-grain tuned configuration of triggering fail over based on organic + traffic. - pip install pybreaker>=1.4.0 +- Failover strategy: + The default strategy is weight-based. It prefers the highest-weight healthy database. -The very basic configuration you need to setup a `MultiDBClient`: +- Command retry: + Command execution supports retry with backoff. Low-level client retries are disabled and a global retry + setting is applied at the multi-database layer. -.. code:: python +- Auto fallback: + If configured with a positive interval, the client periodically attempts to fall back to a higher-priority + healthy database. - // Expected active database (highest weight) - database1_config = DatabaseConfig( - weight=1.0, - from_url="redis://host1:port1", - client_kwargs={ - 'username': "username", - 'password': "password", - } - ) +- Events: + The client emits events like "active database changed" and "commands failed". Pub/Sub resubscription + on database switch is handled automatically. - // Passive database (stand-by replica) - database2_config = DatabaseConfig( - weight=0.9, - from_url="redis://host2:port2", - client_kwargs={ - 'username': "username", - 'password': "password", - } - ) - - config = MultiDbConfig( - databases_config=[database1_config, database2_config], - ) - - client = MultiDBClient(config) - - -Health Monitoring +Synchronous usage ----------------- -To avoid false positives, you can configure amount of health check probes and also -define one of the health check policies to evaluate probes result. - -**HealthCheckPolicies.HEALTHY_ALL** - (default) All probes should be successful -**HealthCheckPolicies.HEALTHY_MAJORITY** - Majority of probes should be successful -**HealthCheckPolicies.HEALTHY_ANY** - Any of probes should be successful - -The `MultiDBClient` uses two complementary mechanisms to ensure database availability: - -Health Checks (Proactive Monitoring) -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -These checks run continuously in the background at configured intervals to proactively -detect database issues. They run in the background with a given interval and -configuration defined in the `MultiDBConfig` class. - -By default, MultiDBClient sends ECHO commands to verify each database is healthy. - -**Custom Health Checks** -~~~~~~~~~~~~~~~~~~~~~ -You can add custom health checks for specific requirements: - -.. code:: python - - from redis.multidb.healthcheck import AbstractHealthCheck - from redis.retry import Retry - from redis.utils import dummy_fail - - - class PingHealthCheck(AbstractHealthCheck): - def __init__(self, retry: Retry): - super().__init__(retry=retry) - - def check_health(self, database) -> bool: - return self._retry.call_with_retry( - lambda: self._returns_pong(database), - lambda _: dummy_fail() - ) - - def _returns_pong(self, database) -> bool: - expected_message = ["PONG", b"PONG"] - actual_message = database.client.execute_command("PING") - return actual_message in expected_message - -**Lag-Aware Healthcheck (Redis Enterprise Only)** -~~~~~~~~~~~~~~~~~~~~~ -This is a special type of healthcheck available for Redis Software and Redis Cloud -that utilizes a REST API endpoint to obtain information about the synchronization -lag between a given database and all other databases in an Active-Active setup. +Minimal example +^^^^^^^^^^^^^^^ -To use this healthcheck, first you need to adjust your `DatabaseConfig` -to expose `health_check_url` used by your deployment. By default, your -Cluster FQDN should be used as URL, unless you have some kind of -reverse proxy behind an actual REST API endpoint. +.. code-block:: python -.. code:: python + from redis.multidb.client import MultiDBClient + from redis.multidb.config import MultiDbConfig, DatabaseConfig - database1_config = DatabaseConfig( - weight=1.0, - from_url="redis://host1:port1", - health_check_url="https://c1.deployment-name-000000.project.env.com" - client_kwargs={ - 'username': "username", - 'password': "password", - } + # Two databases. The first has higher weight -> preferred when healthy. + cfg = MultiDbConfig( + databases_config=[ + DatabaseConfig(from_url="redis://db-primary:6379/0", weight=1.0), + DatabaseConfig(from_url="redis://db-secondary:6379/0", weight=0.5), + ] ) -Since, Lag-Aware Healthcheck is only available for Redis Software and Redis Cloud -it's not in the list of the default healthchecks for `MultiDBClient`. You have -to provide it manually during client configuration or in runtime. - -.. code:: python + client = MultiDBClient(cfg) - // Configuration option - config = MultiDbConfig( - databases_config=[database1_config, database2_config], - health_checks=[ - LagAwareHealthCheck(auth_basic=('username','password'), verify_tls=False) - ] - ) + # First call triggers initialization and health checks. + client.set("key", "value") + print(client.get("key")) - client = MultiDBClient(config) - -.. code:: python - - // In runtime - client = MultiDBClient(config) - client.add_health_check( - LagAwareHealthCheck(auth_basic=('username','password'), verify_tls=False) + # Pipeline + with client.pipeline() as pipe: + pipe.set("a", 1) + pipe.incrby("a", 2) + values = pipe.execute() + print(values) + + # Transaction + def txn(pipe): + current = pipe.get("balance") + current = int(current or 0) + pipe.multi() # mark transaction + pipe.set("balance", current + 100) + + client.transaction(txn) + + # Pub/Sub usage - will automatically re-subscribe on database switch + pubsub = client.pubsub() + pubsub.subscribe("events") + + # In your loop: + message = pubsub.get_message(timeout=1.0) + if message: + print(message) + +Asyncio usage +------------- + +The asyncio API mirrors the synchronous one and provides async/await semantics. + +.. code-block:: python + + import asyncio + from redis.asyncio.multidb.client import MultiDBClient + from redis.asyncio.multidb.config import MultiDbConfig, DatabaseConfig + + async def main(): + cfg = MultiDbConfig( + databases_config=[ + DatabaseConfig(from_url="redis://db-primary:6379/0", weight=1.0), + DatabaseConfig(from_url="redis://db-secondary:6379/0", weight=0.5), + ] + ) + + # Context-manager approach for graceful client termination when exits. + # client = MultiDBClient(cfg) could be used instead + async with MultiDBClient(cfg) as client: + await client.set("key", "value") + print(await client.get("key")) + + # Pipeline + async with client.pipeline() as pipe: + pipe.set("a", 1) + pipe.incrby("a", 2) + values = await pipe.execute() + print(values) + + # Transaction + async def txn(pipe): + current = await pipe.get("balance") + current = int(current or 0) + await pipe.multi() + await pipe.set("balance", current + 100) + + await client.transaction(txn) + + # Pub/Sub + pubsub = client.pubsub() + await pubsub.subscribe("events") + message = await pubsub.get_message(timeout=1.0) + if message: + print(message) + + asyncio.run(main()) + +Configuration +------------- + +MultiDbConfig +^^^^^^^^^^^^^ + +.. code-block:: python + + from redis.multidb.config import ( + MultiDbConfig, DatabaseConfig, + DEFAULT_HEALTH_CHECK_INTERVAL, DEFAULT_GRACE_PERIOD ) - -As mentioned we utilize REST API endpoint for Lag-Aware healthchecks, so it accepts -different type of HTTP-related configuration: authentication credentials, request -timeout, TLS related configuration, etc. (check `LagAwareHealthCheck` class). - -You can also specify `lag_aware_tolerance` parameter to specify the tolerance in MS -of lag between databases that your application could tolerate. - -.. code:: python - - LagAwareHealthCheck( - rest_api_port=9443, - auth_basic=('username','password'), - lag_aware_tolerance=150, - verify_tls=True, - ca_file="path/to/file" + from redis.retry import Retry + from redis.backoff import ExponentialWithJitterBackoff + + cfg = MultiDbConfig( + databases_config=[ + # Construct via URL + DatabaseConfig( + from_url="redis://db-a:6379/0", + weight=1.0, + # Optional: use a custom circuit breaker grace period + grace_period=DEFAULT_GRACE_PERIOD, + # Optional: Redis Enterprise cluster FQDN for REST health checks + health_check_url="https://cluster.example.com", + # Optional: Underlying Redis client related configuration + client_kwargs={"socket_timeout": 5} + ), + # Or construct via ConnectionPool + # DatabaseConfig(from_pool=my_pool, weight=1.0), + ], + + # Global command retry policy (applied at multi-db layer) + command_retry=Retry( + retries=3, + backoff=ExponentialWithJitterBackoff(base=1, cap=10), + ), + + # Health checks + health_check_interval: float = DEFAULT_HEALTH_CHECK_INTERVAL # seconds + health_check_probes: int = DEFAULT_HEALTH_CHECK_PROBES + health_check_delay: float = DEFAULT_HEALTH_CHECK_DELAY # seconds + health_check_policy: HealthCheckPolicies = DEFAULT_HEALTH_CHECK_POLICY, + + # Failure detector + min_num_failures: int = DEFAULT_MIN_NUM_FAILURES + failure_rate_threshold: float = DEFAULT_FAILURE_RATE_THRESHOLD + failures_detection_window: float = DEFAULT_FAILURES_DETECTION_WINDOW # seconds + + # Failover behavior + failover_attempts: int = DEFAULT_FAILOVER_ATTEMPTS + failover_delay: float = DEFAULT_FAILOVER_DELAY # seconds ) +Notes: -Failure Detection (Reactive Monitoring) -~~~~~~~~~~~~~~~~~~~~~ - -The failure detector watches actual command failures and marks databases as unhealthy -when error rates exceed thresholds within a sliding time window of a few seconds. -This catches issues that proactive health checks might miss during real traffic. -You can extend the list of failure detectors by providing your own implementation, -configuration defined in the `MultiDBConfig` class. - - -Failover strategy -~~~~~~~~~~~~~~~~~ - -This component is responsible for failover when active database becomes unavailable. -By default, we're using `WeightBasedFailoverStrategy` to pick a database with the -highest weight to failover. You can provide your own strategy if you would like -to have your custom mechanism of failover. - -.. code:: python - - class CustomFailoverStrategy(FailoverStrategy): - def __init__(self): - self._databases: Databases = None - - def database(self) -> SyncDatabase: - for database, _ in self._databases: - random_int = random.randint(0, 1) - - if random_int == 1 and database.circuit.state == State.CLOSED: - return database - - // Exception should be raised if theres no suitable databases for failover - raise NoValidDatabaseException("No available database for failover") - -In case if there's no available databases for failover, we raise `TemporaryUnavailableException`. -This exception signals that you can still trying to send requests until final -`NoValidDatabaseException` will be thrown. The window for requests is configurable -and depends on two parameters `failover_attempts` and `failover_delay`. By default, -`failover_attempts=10` and `failover_delay=12s`, which means that you can still send requests -for 10*12 = 120 seconds until final exception will be thrown. In meanwhile, you can switch to -another data source (cache) and if healthy database will apears you can switch back making -this transparent to the end user. +- Low-level client retries are disabled automatically per database. The multi-database layer handles retries. +- For clusters, health checks validate all nodes. - -Databases configuration ------------------------ +DatabaseConfig +^^^^^^^^^^^^^^ Each database needs a `DatabaseConfig` that specifies how to connect. Method 1: Using client_kwargs (most flexible) ~~~~~~~~~~~~~~~~~~~~~ - There's an underlying instance of `Redis` or `RedisCluster` client for each database, so you can pass all the arguments related to them via `client_kwargs` argument: .. code:: python - database_config = DatabaseConfig( weight=1.0, client_kwargs={ @@ -242,7 +226,6 @@ Method 2: Using Redis URL ~~~~~~~~~~~~~~~~~~~~~~~~~ .. code:: python - database_config1 = DatabaseConfig( weight=1.0, from_url="redis://host1:port1", @@ -256,7 +239,6 @@ Method 3: Using Custom Connection Pool ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code:: python - database_config2 = DatabaseConfig( weight=0.9, from_pool=connection_pool, @@ -265,218 +247,229 @@ Method 3: Using Custom Connection Pool **Important**: Don't pass `Retry` objects in `client_kwargs`. `MultiDBClient` handles all retries at the top level through the `command_retry` configuration. +Health checks +------------- -Pipeline Operations -------------------- - -The `MultiDBClient` supports pipeline mode with guaranteed retry functionality during -failover scenarios. Unlike standard `Redis` and `RedisCluster` clients, transactions -cannot be executed through pipeline mode - use the dedicated `transaction()` method -instead. This design choice ensures better retry handling during failover events. - -Pipeline operations support both chaining calls and context manager patterns: - -Chaining approach -~~~~~~~~~~~~~~~~~ - -.. code:: python - - client = MultiDBClient(config) - pipe = client.pipeline() - pipe.set('key1', 'value1') - pipe.get('key1') - pipe.execute() // ['OK', 'value1'] - -Context Manager Approach -~~~~~~~~~~~~~~~~~~~~~~~~ - -.. code:: python - - client = MultiDBClient(config) - with client.pipeline() as pipe: - pipe.set('key1', 'value1') - pipe.get('key1') - pipe.execute() // ['OK', 'value1'] - - -Transaction ------------ - -The `MultiDBClient` provides transaction support through the `transaction()` -method with guaranteed retry capabilities during failover. Like other -`Redis` clients, it accepts a callback function that receives a `Pipeline` -object for building atomic operations. +To avoid false positives, you can configure amount of health check probes and also +define one of the health check policies to evaluate probes result. -CAS behavior is fully supported by providing a list of -keys to monitor: +**HealthCheckPolicies.HEALTHY_ALL** - (default) All probes should be successful +**HealthCheckPolicies.HEALTHY_MAJORITY** - Majority of probes should be successful +**HealthCheckPolicies.HEALTHY_ANY** - Any of probes should be successful -.. code:: python +EchoHealthCheck (default) +^^^^^^^^^^^^^^^^^^^^^^^^^ - client = MultiDBClient(config) +The default health check sends ECHO to the database (and to all nodes for clusters). - def callback(pipe: Pipeline): - pipe.set('key1', 'value1') - pipe.get('key1') +Lag-Aware Healthcheck (Redis Enterprise Only) +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - client.transaction(callback, 'key1') // ['OK1', 'value1'] +This is a special type of healthcheck available for Redis Software and Redis Cloud +that utilizes a REST API endpoint to obtain information about the synchronization +lag between a given database and all other databases in an Active-Active setup. +To use this healthcheck, first you need to adjust your `DatabaseConfig` +to expose `health_check_url` used by your deployment. By default, your +Cluster FQDN should be used as URL, unless you have some kind of +reverse proxy behind an actual REST API endpoint. -Pub/Sub -------- +.. code-block:: python -The MultiDBClient offers Pub/Sub functionality with automatic re-subscription -to channels during failover events. For optimal failover handling, -both publishers and subscribers should use MultiDBClient instances. + from redis.multidb.client import MultiDBClient + from redis.multidb.config import MultiDbConfig, DatabaseConfig + from redis.multidb.healthcheck import EchoHealthCheck, LagAwareHealthCheck + from redis.retry import Retry + from redis.backoff import ExponentialWithJitterBackoff + + cfg = MultiDbConfig( + databases_config=[ + DatabaseConfig( + from_url="redis://db-primary:6379/0", + weight=1.0, + health_check_url="https://cluster.example.com", # optional for LagAware + ), + DatabaseConfig( + from_url="redis://db-secondary:6379/0", + weight=0.5, + health_check_url="https://cluster.example.com", + ), + ], + # Add custom checks (in addition to default EchoHealthCheck) + health_checks=[ + # Redis Enterprise REST-based lag-aware check + LagAwareHealthCheck( + # Customize REST port, lag tolerance, TLS, etc. + rest_api_port=9443, + lag_aware_tolerance=100, # ms + verify_tls=True, + # auth_basic=("user", "pass"), + # ca_file="/path/ca.pem", + # client_cert_file="/path/cert.pem", + # client_key_file="/path/key.pem", + ), + ], + ) -1. **Subscriber failover**: Automatically reconnects to an alternative database -and re-subscribes to the same channels + client = MultiDBClient(cfg) -2. **Publisher failover**: Seamlessly switches to an alternative database and -continues publishing to the same channels +Failure detection +----------------- -**Note**: Message loss may occur if failover events happen in reverse order -(publisher fails before subscriber). +A CommandFailureDetector observes failures within a time window, if minimal number of failures +and failures rate reached it triggers fail over. -Main Thread Message Processing -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +.. code-block:: python -.. code:: python + from redis.multidb.config import MultiDbConfig, DatabaseConfig + from redis.multidb.client import MultiDBClient + from redis.multidb.failure_detector import CommandFailureDetector - client = MultiDBClient(config) - p = client.pubsub() + cfg = MultiDbConfig( + databases_config=[ + DatabaseConfig(from_url="redis://db-a:6379/0", weight=1.0), + DatabaseConfig(from_url="redis://db-b:6379/0", weight=0.5), + ], + # Default detector also created from config values + ) - // In the main thread - while True: - message = p.get_message() - if message: - // do something with the message - time.sleep(0.001) + client = MultiDBClient(cfg) + # Add an additional detector, optionally limited to specific exception types: + from redis.exceptions import TimeoutError + client.add_failure_detector( + CustomFailureDetector() + ) -Background Thread Processing -~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Failover and auto fallback +-------------------------- -.. code:: python +Weight-based failover chooses the highest-weight database whose circuit is CLOSED. If no database is +healthy it returns `TemporaryUnavailableException`. This exception indicates that application can +still send requests for some time (depends on configuration (`failover_attempts` * `failover_delay`) +120 seconds by default) until `NoValidDatabaseException` will be thrown. - // In separate thread - client = MultiDBClient(config) - p = client.pubsub() - messages_count = 0 - data = json.dumps({'message': 'test'}) +To enable periodic fallback to a higher-priority healthy database, set `auto_fallback_interval` (seconds): - def handler(message): - nonlocal messages_count - messages_count += 1 +.. code-block:: python - // Assign a handler and run in a separate thread. - p.subscribe(**{'test-channel': handler}) - pubsub_thread = pubsub.run_in_thread(sleep_time=0.1, daemon=True) + from redis.multidb.config import MultiDbConfig, DatabaseConfig - for _ in range(10): - client.publish('test-channel', data) - sleep(0.1) + cfg = MultiDbConfig( + databases_config=[ + DatabaseConfig(from_url="redis://db-primary:6379/0", weight=1.0), + DatabaseConfig(from_url="redis://db-secondary:6379/0", weight=0.5), + ], + # Try to fallback to higher-weight healthy database every 30 seconds + auto_fallback_interval=30.0, + ) + client = MultiDBClient(cfg) +Managing databases at runtime +----------------------------- -OSS Cluster API support ------------------------ +You can manually add/remove databases, update weights, and promote a database if it’s healthy. -As mentioned `MultiDBClient` also supports integration with OSS Cluster API -databases. If you're instantiating client using Redis URL, the only change -you need comparing to standalone client is the `client_class` argument. -DNS server will resolve given URL and will point you to one of the nodes that -could be used to discover overall cluster topology. +.. code-block:: python -.. code:: python + from redis.multidb.client import MultiDBClient + from redis.multidb.config import MultiDbConfig, DatabaseConfig + from redis.multidb.database import Database + from redis.multidb.circuit import PBCircuitBreakerAdapter + import pybreaker + from redis import Redis - config = MultiDbConfig( - client_class=RedisCluster, - databases_config=[database1_config, database2_config], + cfg = MultiDbConfig( + databases_config=[DatabaseConfig(from_url="redis://db-a:6379/0", weight=1.0)] ) + client = MultiDBClient(cfg) + + # Add a database programmatically + other = Database( + client=Redis.from_url("redis://db-b:6379/0"), + circuit=PBCircuitBreakerAdapter(pybreaker.CircuitBreaker(reset_timeout=5.0)), + weight=0.5, + health_check_url=None, + ) + client.add_database(other) -If you would like to specify the exact node to use for topology -discovery, you can specify it the same way `RedisCluster` does + # Update weight; if it becomes the highest and healthy, it may become active + client.update_database_weight(other, 0.9) -.. code:: python + # Promote a specific healthy database to active + client.set_active_database(other) - // Expected active database (highest weight) - database1_config = DatabaseConfig( - weight=1.0, - client_kwargs={ - 'username': "username", - 'password': "password", - 'startup_nodes': [ClusterNode('host1', 'port1')], - } - ) + # Remove a database + client.remove_database(other) - // Passive database (stand-by replica) - database2_config = DatabaseConfig( - weight=0.9, - client_kwargs={ - 'username': "username", - 'password': "password", - 'startup_nodes': [ClusterNode('host2', 'port2')], - } - ) - - config = MultiDbConfig( - client_class=RedisCluster, - databases_config=[database1_config, database2_config], - ) +Pub/Sub and re-subscription +-------------------------- -Sharded Pub/Sub -~~~~~~~~~~~~~~~ +The MultiDBClient offers Pub/Sub functionality with automatic re-subscription +to channels during failover events. For optimal failover handling, +both publishers and subscribers should use MultiDBClient instances. -If you would like to use a Sharded Pub/Sub capabilities make sure to use -correct Pub/Sub configuration. +1. **Subscriber failover**: Automatically reconnects to an alternative database +and re-subscribes to the same channels +2. **Publisher failover**: Seamlessly switches to an alternative database and +continues publishing to the same channels +**Note**: Message loss may occur if failover events happen in reverse order +(publisher fails before subscriber). -.. code:: python +.. code-block:: python - client = MultiDBClient(config) - p = client.pubsub() + pubsub = client.pubsub() + pubsub.subscribe("news", "alerts") + # If failover happens here, subscriptions are re-established on the new active DB. + msg = pubsub.get_message(timeout=1.0) + if msg: + print(msg) - // In the main thread - while True: - // Reads messaage from sharded channels. - message = p.get_sharded_message() - if message: - // do something with the message - time.sleep(0.001) +Pipelines and transactions +-------------------------- +Pipelines and transactions are executed against the active database at execution time. The client ensures +the active database is healthy and up-to-date before running the stack. -.. code:: python +.. code-block:: python - // In separate thread - client = MultiDBClient(config) - p = client.pubsub() - messages_count = 0 - data = json.dumps({'message': 'test'}) + with client.pipeline() as pipe: + pipe.set("x", 1) + pipe.incr("x") + results = pipe.execute() - def handler(message): - nonlocal messages_count - messages_count += 1 + def txn(pipe): + pipe.multi() + pipe.set("y", "42") - // Assign a handler and run in a separate thread. - p.ssubscribe(**{'test-channel': handler}) + client.transaction(txn) - // Proactively executes get_sharded_pubsub() method - pubsub_thread = pubsub.run_in_thread(sleep_time=0.1, daemon=True, sharded_pubsub=True) +Best practices +-------------- - for _ in range(10): - client.spublish('test-channel', data) - sleep(0.1) +- Assign the highest weight to your primary database and lower weights to replicas or DR sites. +- Keep health_check_interval short enough to promptly detect failures but avoid excessive load. +- Tune command_retry and failover attempts to your SLA and workload profile. +- Use auto_fallback_interval if you want the client to fail iver back to your primary automatically. +- Handle `TemporaryUnavailableException` to be able to recover before giving up, in meantime you +can switch data source (f.e cache). `NoValidDatabaseException` indicates that there's no healthy +database to operate. -Async implementation --------------------- +Troubleshooting +--------------- -`MultiDBClient` is available with async API, which looks exactly as it's sync -analogue. The core difference is that it fully relies on `EventLoop` instead of -`threading` module. +- NoValidDatabaseException: + Indicates no healthy database is available. Check circuit breaker states and health checks. -Async client comes with async context manager support and is recommended for -graceful task cancelling. +- TemporaryUnavailableException + Indicates that currently there's no healthy database, but you can still send requests until + NoValidDatabaseException will be thrown. Probe interval configured with `failure_attemtps` + and `failure_delay` parameters. -.. code:: python +- Health checks always failing: + Verify connectivity and, for clusters, that all nodes are reachable. For LagAwareHealthCheck, + ensure health_check_url points to your Redis Enterprise endpoint and authentication/TLS options + are configured properly. - async with MultiDBClient(client_config) as client: - await client.set('key', 'value') - return await client.get('key') \ No newline at end of file +- Pub/Sub not receiving messages after failover: + Ensure you are using the client’s Pub/Sub helper. The client re-subscribes automatically on switch. \ No newline at end of file From eb8eaaa4b00d6a3a65d8108dd7b09239dfaf8143 Mon Sep 17 00:00:00 2001 From: vladvildanov Date: Wed, 24 Sep 2025 11:46:44 +0300 Subject: [PATCH 08/14] Fixed spelling --- .github/wordlist.txt | 6 ++++++ docs/multi_database.rst | 10 +++++----- 2 files changed, 11 insertions(+), 5 deletions(-) diff --git a/.github/wordlist.txt b/.github/wordlist.txt index ab209d34be..0a69b9092a 100644 --- a/.github/wordlist.txt +++ b/.github/wordlist.txt @@ -16,6 +16,7 @@ config CoreCommands DatabaseConfig DNS +EchoHealthCheck EVAL EVALSHA failover @@ -32,9 +33,11 @@ JSONCommands Jaeger Ludovico Magnocavallo +MultiDbConfig MultiDBClient McCurdy NOSCRIPT +NoValidDatabaseException NUMPAT NUMPT NUMSUB @@ -55,6 +58,7 @@ RedisInstrumentor RedisJSON RedisTimeSeries SHA +SLA SearchCommands SentinelCommands SentinelConnectionPool @@ -64,6 +68,7 @@ SpanKind Specfiying StatusCode TCP +TemporaryUnavailableException TLS TOPKCommands TimeSeriesCommands @@ -104,6 +109,7 @@ json keyslot keyspace kwarg +kwargs linters localhost lua diff --git a/docs/multi_database.rst b/docs/multi_database.rst index c044ee40f4..122023859b 100644 --- a/docs/multi_database.rst +++ b/docs/multi_database.rst @@ -39,7 +39,7 @@ Key concepts healthy database. - Events: - The client emits events like "active database changed" and "commands failed". Pub/Sub resubscription + The client emits events like "active database changed" and "commands failed". Pub/Sub re-subscription on database switch is handled automatically. Synchronous usage @@ -316,7 +316,7 @@ reverse proxy behind an actual REST API endpoint. Failure detection ----------------- -A CommandFailureDetector observes failures within a time window, if minimal number of failures +A `CommandFailureDetector` observes failures within a time window, if minimal number of failures and failures rate reached it triggers fail over. .. code-block:: python @@ -450,7 +450,7 @@ Best practices - Assign the highest weight to your primary database and lower weights to replicas or DR sites. - Keep health_check_interval short enough to promptly detect failures but avoid excessive load. - Tune command_retry and failover attempts to your SLA and workload profile. -- Use auto_fallback_interval if you want the client to fail iver back to your primary automatically. +- Use auto_fallback_interval if you want the client to fail over back to your primary automatically. - Handle `TemporaryUnavailableException` to be able to recover before giving up, in meantime you can switch data source (f.e cache). `NoValidDatabaseException` indicates that there's no healthy database to operate. @@ -463,11 +463,11 @@ Troubleshooting - TemporaryUnavailableException Indicates that currently there's no healthy database, but you can still send requests until - NoValidDatabaseException will be thrown. Probe interval configured with `failure_attemtps` + `NoValidDatabaseException` will be thrown. Probe interval configured with `failure_attemtps` and `failure_delay` parameters. - Health checks always failing: - Verify connectivity and, for clusters, that all nodes are reachable. For LagAwareHealthCheck, + Verify connectivity and, for clusters, that all nodes are reachable. For `LagAwareHealthCheck`, ensure health_check_url points to your Redis Enterprise endpoint and authentication/TLS options are configured properly. From 7618d737ca1bd2a6c626853677031468c507703d Mon Sep 17 00:00:00 2001 From: Vladyslav Vildanov <117659936+vladvildanov@users.noreply.github.com> Date: Tue, 30 Sep 2025 15:38:48 +0300 Subject: [PATCH 09/14] Update docs/multi_database.rst Co-authored-by: Elena Kolevska --- docs/multi_database.rst | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/multi_database.rst b/docs/multi_database.rst index 122023859b..d197424d9b 100644 --- a/docs/multi_database.rst +++ b/docs/multi_database.rst @@ -253,9 +253,9 @@ Health checks To avoid false positives, you can configure amount of health check probes and also define one of the health check policies to evaluate probes result. -**HealthCheckPolicies.HEALTHY_ALL** - (default) All probes should be successful -**HealthCheckPolicies.HEALTHY_MAJORITY** - Majority of probes should be successful -**HealthCheckPolicies.HEALTHY_ANY** - Any of probes should be successful +**HealthCheckPolicies.HEALTHY_ALL** - (default) All probes should be successful. +**HealthCheckPolicies.HEALTHY_MAJORITY** - Majority of probes should be successful. +**HealthCheckPolicies.HEALTHY_ANY** - Any of probes should be successful. EchoHealthCheck (default) ^^^^^^^^^^^^^^^^^^^^^^^^^ From c5e896d3e4c46640747bd81407da1fc3f2110ebe Mon Sep 17 00:00:00 2001 From: vladvildanov Date: Tue, 30 Sep 2025 16:08:18 +0300 Subject: [PATCH 10/14] Apply suggested comments --- docs/multi_database.rst | 95 ++++++++++++++++++++++++++++++----------- 1 file changed, 70 insertions(+), 25 deletions(-) diff --git a/docs/multi_database.rst b/docs/multi_database.rst index 122023859b..7dfe4db1ac 100644 --- a/docs/multi_database.rst +++ b/docs/multi_database.rst @@ -18,14 +18,15 @@ Key concepts and HALF_OPEN (probing). Health checks toggle these states to avoid hammering a downed database. - Health checks: - A set of checks determines whether a database is healthy. By default, an "ECHO" check runs against - the database (all cluster nodes must pass for a cluster). You can add custom checks. A Redis Enterprise - specific "lag-aware" health check is also available. + A set of checks determines whether a database is healthy in proactive manner. + By default, an "ECHO" check runs against the database (all cluster nodes must + pass for a cluster). You can add custom checks. A Redis Enterprise specific + "lag-aware" health check is also available. - Failure detector: - A detector observes command failures over a moving window. You can specify an exact number of failures - and failures rate to have more fine-grain tuned configuration of triggering fail over based on organic - traffic. + A detector observes command failures over a moving window (reactive monitoring). + You can specify an exact number of failures and failures rate to have more + fine-grain tuned configuration of triggering fail over based on organic traffic. - Failover strategy: The default strategy is weight-based. It prefers the highest-weight healthy database. @@ -142,6 +143,15 @@ The asyncio API mirrors the synchronous one and provides async/await semantics. asyncio.run(main()) + +MultiDBClient +^^^^^^^^^^^^^ + +The client provides the same API as `Redis` or `RedisCluster` client, so it's +interchangable to provide a seemless upgrade for your application. As well +client provides an option to reconfigure it in runtime (add health checks, +failure detectors or even new databases). + Configuration ------------- @@ -247,8 +257,15 @@ Method 3: Using Custom Connection Pool **Important**: Don't pass `Retry` objects in `client_kwargs`. `MultiDBClient` handles all retries at the top level through the `command_retry` configuration. -Health checks -------------- +Health Monitoring +----------------- +The `MultiDBClient` uses two complementary mechanisms to ensure database availability: + +Health Checks (Proactive Monitoring) +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +These checks run continuously in the background at configured intervals to proactively +detect database issues. They run in the background with a given interval and +configuration defined in the `MultiDBConfig` class. To avoid false positives, you can configure amount of health check probes and also define one of the health check policies to evaluate probes result. @@ -260,7 +277,8 @@ define one of the health check policies to evaluate probes result. EchoHealthCheck (default) ^^^^^^^^^^^^^^^^^^^^^^^^^ -The default health check sends ECHO to the database (and to all nodes for clusters). +The default health check sends the [ECHO](https://redis.io/docs/latest/commands/echo/) +to the database (and to all nodes for clusters). Lag-Aware Healthcheck (Redis Enterprise Only) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ @@ -313,17 +331,46 @@ reverse proxy behind an actual REST API endpoint. client = MultiDBClient(cfg) -Failure detection + +**Custom Health Checks** +~~~~~~~~~~~~~~~~~~~~~ +You can add custom health checks for specific requirements: + +.. code:: python + from redis.multidb.healthcheck import AbstractHealthCheck + from redis.retry import Retry + from redis.utils import dummy_fail + class PingHealthCheck(AbstractHealthCheck): + def __init__(self, retry: Retry): + super().__init__(retry=retry) + def check_health(self, database) -> bool: + return self._retry.call_with_retry( + lambda: self._returns_pong(database), + lambda _: dummy_fail() + ) + def _returns_pong(self, database) -> bool: + expected_message = ["PONG", b"PONG"] + actual_message = database.client.execute_command("PING") + return actual_message in expected_message + + +Failure Detection (Reactive Monitoring) ----------------- -A `CommandFailureDetector` observes failures within a time window, if minimal number of failures -and failures rate reached it triggers fail over. +The failure detector monitor actual command failures and marks databases as unhealthy +when failures count and failure rate exceed thresholds within a sliding time window +of a few seconds. This catches issues that proactive health checks might miss during +real traffic. You can extend the list of failure detectors by providing your own +implementation, configuration defined in the `MultiDBConfig` class. + +By default failure detector is configured for 1000 failures and 10% failure rate +threshold within a 2 seconds sliding window, this could be adjusted regarding +your applciation specifics and traffic. .. code-block:: python from redis.multidb.config import MultiDbConfig, DatabaseConfig from redis.multidb.client import MultiDBClient - from redis.multidb.failure_detector import CommandFailureDetector cfg = MultiDbConfig( databases_config=[ @@ -336,7 +383,6 @@ and failures rate reached it triggers fail over. client = MultiDBClient(cfg) # Add an additional detector, optionally limited to specific exception types: - from redis.exceptions import TimeoutError client.add_failure_detector( CustomFailureDetector() ) @@ -447,13 +493,13 @@ the active database is healthy and up-to-date before running the stack. Best practices -------------- -- Assign the highest weight to your primary database and lower weights to replicas or DR sites. -- Keep health_check_interval short enough to promptly detect failures but avoid excessive load. -- Tune command_retry and failover attempts to your SLA and workload profile. -- Use auto_fallback_interval if you want the client to fail over back to your primary automatically. -- Handle `TemporaryUnavailableException` to be able to recover before giving up, in meantime you -can switch data source (f.e cache). `NoValidDatabaseException` indicates that there's no healthy -database to operate. +- Assign the highest weight to your primary database and lower weights to replicas or disaster recovery sites. +- Keep `health_check_interval` short enough to promptly detect failures but avoid excessive load. +- Tune `command_retry` and failover attempts to your SLA and workload profile. +- Use `auto_fallback_interval` if you want the client to fail over back to your primary automatically. +- Handle `TemporaryUnavailableException` to be able to recover before giving up. In the meantime, you +can switch the data source (e.g. cache). `NoValidDatabaseException` indicates that there are no healthy +databases to operate. Troubleshooting --------------- @@ -462,13 +508,12 @@ Troubleshooting Indicates no healthy database is available. Check circuit breaker states and health checks. - TemporaryUnavailableException - Indicates that currently there's no healthy database, but you can still send requests until - `NoValidDatabaseException` will be thrown. Probe interval configured with `failure_attemtps` - and `failure_delay` parameters. + Indicates that currently there are no healthy databases, but you can still send requests until + `NoValidDatabaseException` is thrown. Probe interval is configured with `failure_attemtps` - Health checks always failing: Verify connectivity and, for clusters, that all nodes are reachable. For `LagAwareHealthCheck`, - ensure health_check_url points to your Redis Enterprise endpoint and authentication/TLS options + ensure `health_check_url` points to your Redis Enterprise endpoint and authentication/TLS options are configured properly. - Pub/Sub not receiving messages after failover: From cd42837ef9d7ce93b22628f1b73c4f1478ae51a6 Mon Sep 17 00:00:00 2001 From: vladvildanov Date: Tue, 30 Sep 2025 16:13:28 +0300 Subject: [PATCH 11/14] Fixed spelling --- docs/multi_database.rst | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/docs/multi_database.rst b/docs/multi_database.rst index b3ff5b09b7..0c2547c45c 100644 --- a/docs/multi_database.rst +++ b/docs/multi_database.rst @@ -148,7 +148,7 @@ MultiDBClient ^^^^^^^^^^^^^ The client provides the same API as `Redis` or `RedisCluster` client, so it's -interchangable to provide a seemless upgrade for your application. As well +interchangeable to provide a seamless upgrade for your application. As well client provides an option to reconfigure it in runtime (add health checks, failure detectors or even new databases). @@ -336,7 +336,8 @@ reverse proxy behind an actual REST API endpoint. ~~~~~~~~~~~~~~~~~~~~~ You can add custom health checks for specific requirements: -.. code:: python +.. code-block:: python + from redis.multidb.healthcheck import AbstractHealthCheck from redis.retry import Retry from redis.utils import dummy_fail @@ -365,7 +366,7 @@ implementation, configuration defined in the `MultiDBConfig` class. By default failure detector is configured for 1000 failures and 10% failure rate threshold within a 2 seconds sliding window, this could be adjusted regarding -your applciation specifics and traffic. +your application specifics and traffic. .. code-block:: python From 947320c7516558d048e5a26986a2b33c480a5b4c Mon Sep 17 00:00:00 2001 From: Vladyslav Vildanov <117659936+vladvildanov@users.noreply.github.com> Date: Thu, 2 Oct 2025 10:22:19 +0300 Subject: [PATCH 12/14] Update docs/multi_database.rst Co-authored-by: Elena Kolevska --- docs/multi_database.rst | 3 +++ 1 file changed, 3 insertions(+) diff --git a/docs/multi_database.rst b/docs/multi_database.rst index 0c2547c45c..bd1de95763 100644 --- a/docs/multi_database.rst +++ b/docs/multi_database.rst @@ -260,6 +260,9 @@ handles all retries at the top level through the `command_retry` configuration. Health Monitoring ----------------- The `MultiDBClient` uses two complementary mechanisms to ensure database availability: +- Health Checks (Proactive Monitoring) +- Failure Detection (Reactive Monitoring) + Health Checks (Proactive Monitoring) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ From 7ba26bea15ab2819971bf11434517c017994778c Mon Sep 17 00:00:00 2001 From: Vladyslav Vildanov <117659936+vladvildanov@users.noreply.github.com> Date: Thu, 2 Oct 2025 10:22:35 +0300 Subject: [PATCH 13/14] Update docs/multi_database.rst Co-authored-by: Elena Kolevska --- docs/multi_database.rst | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/docs/multi_database.rst b/docs/multi_database.rst index bd1de95763..f1d42a9490 100644 --- a/docs/multi_database.rst +++ b/docs/multi_database.rst @@ -147,10 +147,7 @@ The asyncio API mirrors the synchronous one and provides async/await semantics. MultiDBClient ^^^^^^^^^^^^^ -The client provides the same API as `Redis` or `RedisCluster` client, so it's -interchangeable to provide a seamless upgrade for your application. As well -client provides an option to reconfigure it in runtime (add health checks, -failure detectors or even new databases). +The client exposes the same API as the `Redis` or `RedisCluster` client, making it fully interchangeable and ensuring a seamless upgrade for your application. Additionally, it supports runtime reconfiguration, allowing you to add features such as health checks, failure detectors, or even new databases without restarting. Configuration ------------- From 614ed51c55d6a7b8990b0d0979feef6053d5de9b Mon Sep 17 00:00:00 2001 From: Vladyslav Vildanov <117659936+vladvildanov@users.noreply.github.com> Date: Thu, 2 Oct 2025 10:22:47 +0300 Subject: [PATCH 14/14] Update docs/multi_database.rst Co-authored-by: Elena Kolevska --- docs/multi_database.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/multi_database.rst b/docs/multi_database.rst index f1d42a9490..6906b09e4d 100644 --- a/docs/multi_database.rst +++ b/docs/multi_database.rst @@ -277,7 +277,7 @@ define one of the health check policies to evaluate probes result. EchoHealthCheck (default) ^^^^^^^^^^^^^^^^^^^^^^^^^ -The default health check sends the [ECHO](https://redis.io/docs/latest/commands/echo/) +The default health check sends the [ECHO](https://redis.io/docs/latest/commands/echo/) command to the database (and to all nodes for clusters). Lag-Aware Healthcheck (Redis Enterprise Only)