Merge pull request #2218 from redis/RDSC-4108-rdi-1-15-0-release-notes

andy-stark-redis · web-flow · commit 985c16c7d25c · 2025-10-08T20:20:28.000+01:00
RDSC-4108: RDI 1.15.0 release notes
diff --git a/config.toml b/config.toml
@@ -55,7 +55,7 @@ rdi_redis_gears_version = "1.2.6"
 rdi_debezium_server_version = "2.3.0.Final"
 rdi_db_types = "cassandra|mysql|oracle|postgresql|sqlserver"
 rdi_cli_latest = "latest"
-rdi_current_version = "1.14.1"
+rdi_current_version = "1.15.0"
 
 [params.clientsConfig]
 "Python"={quickstartSlug="redis-py"}
diff --git a/content/integrate/redis-data-integration/data-pipelines/pipeline-config.md b/content/integrate/redis-data-integration/data-pipelines/pipeline-config.md
@@ -144,9 +144,6 @@ processors:
   # Time (in ms) after which data will be read from stream even if
   # read_batch_size was not reached.
   # duration: 100
-  # Data type to use in Redis target database: `hash` for Redis Hash,
-  # `json` for JSON (which requires the RedisJSON module).
-  # target_data_type: hash
   # The batch size for writing data to the target Redis database. Should be
   # less than or equal to the read_batch_size.
   # write_batch_size: 200
@@ -155,8 +152,26 @@ processors:
   # Max size of the deduplication set (default: 1024).
   # dedup_max_size: <DEDUP_MAX_SIZE>
   # Error handling strategy: ignore - skip, dlq - store rejected messages
-  # in a dead letter queue
+  # in a dead letter queue.
   # error_handling: dlq
+  # Dead letter queue max messages per stream.
+  # dlq_max_messages: 1000
+  # Data type to use in Redis target database: `hash` for Redis Hash,
+  # `json` for JSON (which requires the RedisJSON module).
+  # target_data_type: hash
+  # Number of processes to use when syncing initial data.
+  # initial_sync_processes: 4
+  # Checks if the batch has been written to the replica shard.
+  # wait_enabled: false
+  # Timeout in milliseconds when checking write to the replica shard.
+  # wait_timeout: 1000
+  # Ensures that a batch has been written to the replica shard and keeps
+  # retrying if not.
+  # retry_on_replica_failure: true
+  # Enable merge as the default strategy to writing JSON documents.
+  # json_update_strategy: merge
+  # Use native JSON merge if the target RedisJSON module supports it.
+  # use_native_json_merge: true
 ```
 
 ## Sections
diff --git a/content/integrate/redis-data-integration/installation/install-vm.md b/content/integrate/redis-data-integration/installation/install-vm.md
@@ -178,11 +178,27 @@ it without the `noexec` option. See
     or your company policy forbids you to install there. You can
     select a different directory for the K3s installation using the
     `--installation-dir` option with `install.sh`:
+```bash
+sudo ./install.sh --installation-dir <custom-installation-directory>
+```
+    {{< /note >}}
+
+    **Advanced**: You can also pass custom K3s parameters to the installer using the
+    `INSTALL_K3S_EXEC` environment variable. For example, to set the kubeconfig file 
+    permissions to be readable by all users:
 
     ```bash
-    sudo ./install.sh --installation-dir <custom-installation-directory>
+    sudo INSTALL_K3S_EXEC='--write-kubeconfig-mode=644' ./install.sh
     ```
-    {{< /note >}}
+
+    You can combine multiple K3s options in the `INSTALL_K3S_EXEC` variable. See the
+    [K3s documentation](https://docs.k3s.io/installation/configuration) for a full list of 
+    available options.
+
+    {{< warning >}}Only modify K3s parameters if you understand exactly what you are changing 
+    and why. Incorrect K3s configuration can cause RDI installation to fail or result in an 
+    unstable deployment. {{< /warning >}}
+    
 
 The RDI installer collects all necessary configuration details and alerts you to potential issues, 
 offering options to abort, apply fixes, or provide additional information. 
diff --git a/content/integrate/redis-data-integration/reference/config-yaml-reference.md b/content/integrate/redis-data-integration/reference/config-yaml-reference.md
@@ -53,6 +53,7 @@ Configuration settings that control how data is processed, including batch sizes
 | **dlq_max_messages**<br/>(DLQ message limit)                         | `integer`, `string` | Maximum number of messages to store in dead letter queue per stream<br/>Default: `1000`<br/>Pattern: `^\${.*}$`<br/>Minimum: `1`<br/>                                                                                                |          |
 | **target_data_type**<br/>(Target Redis data type)                    | `string`            | Data type to use in Redis: hash for Redis Hash, json for RedisJSON (requires RedisJSON module)<br/>Default: `"hash"`<br/>Pattern: `^\${.*}$\|hash\|json`<br/>                                                                        |          |
 | **json_update_strategy**                                             | `string`            | (DEPRECATED)<br/>Property 'json_update_strategy' will be deprecated in future releases. Use 'on_update' job-level property to define the json update strategy.<br/>Default: `"replace"`<br/>Pattern: `^\${.*}$\|replace\|merge`<br/> |          |
+| **use_native_json_merge**<br/>(Use native JSON merge)               | `boolean`           | Controls whether to use the native `JSON.MERGE` command (when `true`) or Lua scripts (when `false`) for JSON merge operations. Introduced in RDI 1.15.0. The native command provides 2x performance improvement but handles null values differently:<br/><br/>**Previous behavior (Lua merge)**: When merging `{"field1": "value1", "field2": "value2"}` with `{"field2": null, "field3": "value3"}`, the result was `{"field1": "value1", "field2": null, "field3": "value3"}` (null value is preserved)<br/><br/>**New behavior (JSON.MERGE)**: The same merge produces `{"field1": "value1", "field3": "value3"}` (null value removes the field, following [RFC 7396](https://datatracker.ietf.org/doc/html/rfc7396))<br/><br/>**Note**: The native `JSON.MERGE` command requires RedisJSON 2.6.0 or higher. If the target database has an older version of RedisJSON, RDI will automatically fall back to using Lua-based merge operations regardless of this setting.<br/><br/>**Impact**: If your application logic distinguishes between a field with a `null` value and a missing field, you may need to adjust your data handling. This follows the JSON Merge Patch RFC standard but differs from the previous Lua implementation. Set to `false` to revert to the previous Lua-based merge behavior if needed.<br/>Default: `true`<br/>                                                                                                                                      |          |
 | **initial_sync_processes**                                           | `integer`, `string` | Number of parallel processes for performing initial data synchronization<br/>Default: `4`<br/>Pattern: `^\${.*}$`<br/>Minimum: `1`<br/>Maximum: `32`<br/>                                                                            |          |
 | **idle_sleep_time_ms**<br/>(Idle sleep interval)                     | `integer`, `string` | Time in milliseconds to sleep between processing batches when idle<br/>Default: `200`<br/>Pattern: `^\${.*}$`<br/>Minimum: `1`<br/>Maximum: `999999`<br/>                                                                            |          |
 | **idle_streams_check_interval_ms**<br/>(Idle streams check interval) | `integer`, `string` | Time in milliseconds between checking for new streams when processor is idle<br/>Default: `1000`<br/>Pattern: `^\${.*}$`<br/>Minimum: `1`<br/>Maximum: `999999`<br/>                                                                 |          |
diff --git a/content/integrate/redis-data-integration/release-notes/rdi-1-15-0.md b/content/integrate/redis-data-integration/release-notes/rdi-1-15-0.md
@@ -0,0 +1,72 @@
+---
+Title: Redis Data Integration release notes 1.15.0 (October 2025)
+alwaysopen: false
+categories:
+- docs
+- operate
+- rs
+description: |
+  Flink collector for Spanner enabled by default for improved user experience.
+  Enhanced high availability with configurable leader election and standby mode.
+  Support for sharded RDI Redis databases.
+  Improved configuration validation and monitoring capabilities.
+  Better resource management and security enhancements.
+linkTitle: 1.15.0 (October 2025)
+toc: 'true'
+weight: 976
+---
+
+RDI's mission is to help Redis customers sync Redis Enterprise with live data from their slow disk-based databases to:
+
+- Meet the required speed and scale of read queries and provide an excellent and predictable user experience.
+- Save resources and time when building pipelines and coding data transformations.
+- Reduce the total cost of ownership by saving money on expensive database read replicas.
+
+RDI keeps the Redis cache up to date with changes in the primary database, using a [_Change Data Capture (CDC)_](https://en.wikipedia.org/wiki/Change_data_capture) mechanism.
+It also lets you _transform_ the data from relational tables into convenient and fast data structures that match your app's requirements. You specify the transformations using a configuration system, so no coding is required.
+
+## What's New in 1.15.0
+
+{{<warning>}}
+**Breaking change when using JSON with `json_update_strategy: merge`**
+
+RDI now uses the native `JSON.MERGE` command instead of Lua scripts for JSON merge operations. While this provides significant performance improvements (2x faster), there is a **functional difference** in how null values are handled:
+
+- **Previous behavior (Lua merge)**: When merging `{"field1": "value1", "field2": "value2"}` with `{"field2": null, "field3": "value3"}`, the result was `{"field1": "value1", "field2": null, "field3": "value3"}` (null value is preserved)
+- **New behavior (JSON.MERGE)**: The same merge produces `{"field1": "value1", "field3": "value3"}` (null value removes the field, following [RFC 7396](https://datatracker.ietf.org/doc/html/rfc7396))
+
+**Impact**: If your application logic distinguishes between a field with a `null` value and a missing field, you may need to adjust your data handling. This follows the JSON Merge Patch RFC standard but differs from the previous Lua implementation.
+
+**Configuration**: You can control this behavior using the `use_native_json_merge` property in the processors section of your configuration. Set it to `false` to revert to the previous Lua-based merge behavior if needed.
+{{</warning>}}
+
+- **Native JSON merge for improved performance**: RDI now automatically uses the native `JSON.MERGE` command from RedisJSON 2.6.0+ instead of Lua scripts for JSON merge operations, providing 2x performance improvement. This feature is enabled by default and can be controlled via the `use_native_json_merge` property in the processors section of the configuration. **Note**: If the target Redis database has RedisJSON version lower than 2.6.0, the processor will automatically revert to using the Lua-based merge implementation.
+- **Support for sharded Redis databases**: RDI now supports writing to multi-sharded Redis Enterprise databases for the RDI database, resolving cross-slot violations when reading from streams.
+- **Enhanced processor performance metrics**: Detailed performance metrics are now exposed through the metrics exporter and statistics endpoint, with separate tracking for transformation time and write time.
+- **Resource management improvements**: Collector and processor pods now support configurable resource requests, limits, and node affinity/tolerations for better cluster resource utilization.
+  - The `collector` defaults to 1 CPU and 1024Mi memory (requests), with limits of 4 CPUs and 4096Mi memory.
+  - The `processor` defaults to 1 CPU and 512Mi memory (requests), with limits of 4 CPUs and 3072Mi memory.
+
+- **Leadership status monitoring**: New metrics expose leadership status and pipeline phase information for better monitoring of HA deployments.
+  - The `rdi_operator_is_leader` metric tracks the current leadership status of the operator: `1` indicates the instance is the leader, `0` indicates it is not the leader.
+  - The `rdi_operator_pipeline_phase` metric tracks the current phase of the pipeline. Phase indicates the current pipeline phase, must be one of `Active`, `Inactive`, `Resetting`, `Pending`, or `Error`.
+- **Improved configuration validation**: More rigid validation for `config.yaml` and `jobs.yaml` files helps catch configuration errors earlier in the deployment process.
+- **Custom K3s installation options**: The installer now supports passing custom arguments to K3s installation for more flexible on-premises deployments.
+  - Example: `sudo INSTALL_K3S_EXEC='--write-kubeconfig-mode=644' ./install.sh`
+- **Workload Identity authentication**: Added support for Google Cloud Workload Identity authentication to Google Cloud Storage (GCS), eliminating the need for service account JSON files if using GSC-based leader election.
+- Exposed new processor performance metrics showing transformation and write times separately
+  - `{namespace}_processor_process_time_ms_total` - Total time spent in the processor (transform + write)
+  - `{namespace}_processor_transform_time_ms_total` - Time spent transforming data
+  - `{namespace}_processor_write_time_ms_total` - Time spent writing data to Redis
+- Exposed all existing processor metrics that were only available through the RDI CLI status and the API statistics endpoint.
+- Enhanced statistics endpoint with new metrics for transform and process time
+
+### Bug Fixes and Stability Improvements
+
+- **Fixed task reconciliation errors**: Resolved "The ID argument cannot be a complete ID because xadd-id-uniqueness-mode is strict" errors during task reconciliation when using an RDI Redis database with strict XADD id uniqueness mode.
+- **Fixed Debezium unavailable values**: Addressed issues where `__debezium_unavailable_value` was appearing in Redis data.
+- **Improved operator stability**: Disabled operator webhooks by default to simplify deployments and reduce potential issues.
+
+## Limitations
+
+RDI can write data to a Redis Active-Active database. However, it doesn't support writing data to two or more Active-Active replicas. Writing data from RDI to several Active-Active replicas could easily harm data integrity as RDI is not synchronous with the source database commits.