From 7171e22f78b6bc7e5728ef504207f37500a2166e Mon Sep 17 00:00:00 2001 From: Zdravko Donev Date: Mon, 18 Aug 2025 13:54:56 +0300 Subject: [PATCH 1/6] RDSC-3972: Initial draft of the Spanner public docs support --- .../embeds/rdi-supported-source-versions.md | 1 + .../data-pipelines/prepare-dbs/spanner.md | 211 ++++++++++++++++++ 2 files changed, 212 insertions(+) create mode 100644 content/integrate/redis-data-integration/data-pipelines/prepare-dbs/spanner.md diff --git a/content/embeds/rdi-supported-source-versions.md b/content/embeds/rdi-supported-source-versions.md index f03217708f..3bd2b1881a 100644 --- a/content/embeds/rdi-supported-source-versions.md +++ b/content/embeds/rdi-supported-source-versions.md @@ -6,5 +6,6 @@ | MySQL | 5.7, 8.0.x, 8.2 | 8.0.x | 8.0 | | PostgreSQL | 10, 11, 12, 13, 14, 15, 16 | 11, 12, 13, 14, 15, 16 | 15 | | SQL Server | 2017, 2019, 2022 | 2016, 2017, 2019, 2022 | 2019 | +| Spanner | - | - | All versions | | AlloyDB for PostgreSQL | 14.2, 15.7 | - | 14.2, 15.7 | | AWS Aurora/PostgreSQL | 15 | 15 | - | \ No newline at end of file diff --git a/content/integrate/redis-data-integration/data-pipelines/prepare-dbs/spanner.md b/content/integrate/redis-data-integration/data-pipelines/prepare-dbs/spanner.md new file mode 100644 index 0000000000..ddaad40e23 --- /dev/null +++ b/content/integrate/redis-data-integration/data-pipelines/prepare-dbs/spanner.md @@ -0,0 +1,211 @@ +--- +Title: Prepare Spanner for RDI +aliases: /integrate/redis-data-integration/ingest/data-pipelines/prepare-dbs/spanner/ +alwaysopen: false +categories: +- docs +- integrate +- rs +- rdi +description: Prepare Google Cloud Spanner databases to work with RDI +group: di +linkTitle: Prepare Spanner +summary: Redis Data Integration keeps Redis in sync with the primary database in near + real time. +type: integration +weight: 2 +--- + +Google Cloud Spanner requires specific configuration to enable change data capture (CDC) with RDI. +RDI operates in two phases with Spanner: snapshot (initial sync) and streaming. During the snapshot +phase, RDI uses the JDBC driver to connect directly to Spanner and read the current state of the +database. In the streaming phase, RDI uses Spanner's Change Streams to capture changes related to +the monitored schemas and tables. + +You must have the necessary privileges to manage the database schema and create service accounts +with the appropriate permissions, so that RDI can access the Spanner database. + +## Prepare for snapshot + +During the snapshot phase, RDI executes multiple transactions to capture data at an exact point +in time that remains consistent across all queries. This is achieved using a Spanner feature called +[Timestamp bounds with Exact staleness](https://cloud.google.com/spanner/docs/timestamp-bounds#exact_staleness). + +This feature relies on the +[version_retention_period](https://cloud.google.com/spanner/docs/reference/rest/v1/projects.instances.databases#Database.FIELDS.version_retention_period), +which is set to 1 hour by default. Depending on the database tier, the volume of data to be +ingested into RDI, and the load on the database, this setting may need to be increased. You can +update it using [this method](https://cloud.google.com/spanner/docs/use-pitr#set-period). + +## Prepare for streaming + +To enable streaming, you must create a change stream in Spanner at the database level. Use the +option `value_capture_type = 'NEW_ROW_AND_OLD_VALUES'` to capture both the previous and updated +row values. + +Be sure to specify only the tables you want to ingest from—and optionally, the specific columns +you're interested in. Here's an example using Google SQL syntax: + +```sql +CREATE CHANGE STREAM change_stream_table1_and_table2 + FOR table1, table2 + OPTIONS ( + value_capture_type = 'NEW_ROW_AND_OLD_VALUES' + ); +``` + +Refer to the [official documentation](https://cloud.google.com/spanner/docs/change-streams/manage#googlesql) +for more details, including additional configuration options and dialect-specific syntax. + +## Create a service account + +To allow RDI to access the Spanner instance, you'll need to create a service account with the +appropriate permissions. This service account will then be provided to RDI as a secret for +authentication. + +### Step 1: Create the service account + +```bash +gcloud iam service-accounts create spanner-reader-account \ + --display-name="Spanner Reader Service Account" \ + --description="Service account for reading from Spanner databases" \ + --project=YOUR_PROJECT_ID +``` + +### Step 2: Grant required roles + +**Database Reader** (read access to Spanner data): + +```bash +gcloud projects add-iam-policy-binding YOUR_PROJECT_ID \ + --member="serviceAccount:spanner-reader-account@YOUR_PROJECT_ID.iam.gserviceaccount.com" \ + --role="roles/spanner.databaseReader" +``` + +**Database User** (query execution and metadata access): + +```bash +gcloud projects add-iam-policy-binding YOUR_PROJECT_ID \ + --member="serviceAccount:spanner-reader-account@YOUR_PROJECT_ID.iam.gserviceaccount.com" \ + --role="roles/spanner.databaseUser" +``` + +**Viewer** (viewing instance and database configuration): + +```bash +gcloud projects add-iam-policy-binding YOUR_PROJECT_ID \ + --member="serviceAccount:spanner-reader-account@YOUR_PROJECT_ID.iam.gserviceaccount.com" \ + --role="roles/spanner.viewer" +``` + +### Step 3: Download the service account key + +Save the credentials locally so they can be used later by RDI: + +```bash +gcloud iam service-accounts keys create ~/spanner-reader-account.json \ + --iam-account=spanner-reader-account@YOUR_PROJECT_ID.iam.gserviceaccount.com \ + --project=YOUR_PROJECT_ID +``` + +## Set up secrets for Kubernetes deployment + +Before deploying the RDI pipeline, you need to configure the necessary secrets for both the source +and target databases. Instructions for setting up the target database secrets are available in the +[RDI deployment guide](/integrate/redis-data-integration/data-pipelines/deploy#set-secrets-for-k8shelm-deployment-using-kubectl-command). + +In addition to the target database secrets, you'll also need to create a Spanner-specific secret +named `source-db-credentials`. This secret should contain the service account key file generated +during the Spanner setup phase. Use the command below to create it: + +```bash +kubectl create secret generic source-db-credentials --namespace=rdi \ +--from-file=gcp-service-account.json=~/spanner-reader-account.json \ +--save-config --dry-run=client -o yaml | kubectl apply -f - +``` + +Be sure to adjust the file path (`~/spanner-reader-account.json`) if your service account key is +stored elsewhere. + +## Configure RDI for Spanner + +When configuring your RDI pipeline for Spanner, use the following example configuration in your +`config.yaml` file: + +```yaml +sources: + source: + type: flink + connection: + type: spanner + project_id: your-project-id + instance_id: your-spanner-instance + database_id: your-spanner-database + change_streams: + change_stream_all: + {} + # retention_hours: 24 + # schemas: + # - DEFAULT + # tables: + # products: {} + # orders: {} + # order_items: {} + # logging: + # level: debug + # advanced: + # source: + # spanner.change.stream.retention.hours: 24 + # spanner.fetch.timeout.milliseconds: 20000 + # spanner.dialect: POSTGRESQL + # flink: + # jobmanager.rpc.port: 7123 + # jobmanager.memory.process.size: 1024m + # taskmanager.numberOfTaskSlots: 3 + # taskmanager.rpc.port: 7122 + # taskmanager.memory.process.size: 2g + # blob.server.port: 7124 + # rest.port: 8082 + # parallelism.default: 4 + # restart-strategy.type: fixed-delay + # restart-strategy.fixed-delay.attempts: 3 +targets: + target: + connection: + type: redis + host: ${HOST_IP} + port: 12000 + user: ${TARGET_DB_USERNAME} + password: ${TARGET_DB_PASSWORD} +processors: + target_data_type: hash +``` + +Make sure to replace the relevant connection details with your own for both the Spanner and target +Redis databases. + +## Additional Kubernetes configuration + +In your `rdi-values.yaml` file for Kubernetes deployment, make sure to configure the `dataPlane` +section like this: + +```yaml +operator: + dataPlane: + flinkCollector: + enabled: true + jobManager: + ingress: + enabled: true + className: traefik # Replace with your ingress controller + hosts: + - hostname # Replace with your Spanner DB hostname +``` + +## Next steps + +After completing the Spanner preparation steps, you can proceed with: + +1. [Installing RDI on Kubernetes](/integrate/redis-data-integration/installation/install-k8s) +2. [Deploying your RDI pipeline](/integrate/redis-data-integration/data-pipelines/deploy") +3. [Using Redis Insight to manage your RDI pipeline](/develop/tools/insight/rdi-connector) From 4546a7189ae69770b0c61b260560b87f6591ce93 Mon Sep 17 00:00:00 2001 From: Zdravko Donev Date: Mon, 18 Aug 2025 13:58:20 +0300 Subject: [PATCH 2/6] Apply suggestions from code review Copilot Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --- .../data-pipelines/prepare-dbs/spanner.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/content/integrate/redis-data-integration/data-pipelines/prepare-dbs/spanner.md b/content/integrate/redis-data-integration/data-pipelines/prepare-dbs/spanner.md index ddaad40e23..1e351ddfd8 100644 --- a/content/integrate/redis-data-integration/data-pipelines/prepare-dbs/spanner.md +++ b/content/integrate/redis-data-integration/data-pipelines/prepare-dbs/spanner.md @@ -199,7 +199,7 @@ operator: enabled: true className: traefik # Replace with your ingress controller hosts: - - hostname # Replace with your Spanner DB hostname + - hostname # Replace with your desired ingress hostname ``` ## Next steps @@ -207,5 +207,5 @@ operator: After completing the Spanner preparation steps, you can proceed with: 1. [Installing RDI on Kubernetes](/integrate/redis-data-integration/installation/install-k8s) -2. [Deploying your RDI pipeline](/integrate/redis-data-integration/data-pipelines/deploy") +2. [Deploying your RDI pipeline](/integrate/redis-data-integration/data-pipelines/deploy) 3. [Using Redis Insight to manage your RDI pipeline](/develop/tools/insight/rdi-connector) From 11f558c8df98012db46c315a20447f7cd794eec1 Mon Sep 17 00:00:00 2001 From: Zdravko Donev Date: Mon, 18 Aug 2025 14:12:34 +0300 Subject: [PATCH 3/6] Add numbering and fix links --- .../data-pipelines/prepare-dbs/spanner.md | 36 +++++++++---------- 1 file changed, 18 insertions(+), 18 deletions(-) diff --git a/content/integrate/redis-data-integration/data-pipelines/prepare-dbs/spanner.md b/content/integrate/redis-data-integration/data-pipelines/prepare-dbs/spanner.md index 1e351ddfd8..9bf10eac20 100644 --- a/content/integrate/redis-data-integration/data-pipelines/prepare-dbs/spanner.md +++ b/content/integrate/redis-data-integration/data-pipelines/prepare-dbs/spanner.md @@ -16,16 +16,20 @@ type: integration weight: 2 --- -Google Cloud Spanner requires specific configuration to enable change data capture (CDC) with RDI. -RDI operates in two phases with Spanner: snapshot (initial sync) and streaming. During the snapshot -phase, RDI uses the JDBC driver to connect directly to Spanner and read the current state of the -database. In the streaming phase, RDI uses Spanner's Change Streams to capture changes related to +Google Cloud Spanner requires specific configuration to enable change data capture (CDC) with RDI. +RDI operates in two phases with Spanner: snapshot (initial sync) and streaming. During the snapshot +phase, RDI uses the JDBC driver to connect directly to Spanner and read the current state of the +database. In the streaming phase, RDI uses Spanner's Change Streams to capture changes related to the monitored schemas and tables. -You must have the necessary privileges to manage the database schema and create service accounts +{{< note >}} +Spanner is only supported with RDI deployed on Kubernetes/Helm. RDI VM mode does not support Spanner as a source database. +{{< /note >}} + +You must have the necessary privileges to manage the database schema and create service accounts with the appropriate permissions, so that RDI can access the Spanner database. -## Prepare for snapshot +## 1. Prepare for snapshot During the snapshot phase, RDI executes multiple transactions to capture data at an exact point in time that remains consistent across all queries. This is achieved using a Spanner feature called @@ -37,7 +41,7 @@ which is set to 1 hour by default. Depending on the database tier, the volume of ingested into RDI, and the load on the database, this setting may need to be increased. You can update it using [this method](https://cloud.google.com/spanner/docs/use-pitr#set-period). -## Prepare for streaming +## 2. Prepare for streaming To enable streaming, you must create a change stream in Spanner at the database level. Use the option `value_capture_type = 'NEW_ROW_AND_OLD_VALUES'` to capture both the previous and updated @@ -57,7 +61,7 @@ CREATE CHANGE STREAM change_stream_table1_and_table2 Refer to the [official documentation](https://cloud.google.com/spanner/docs/change-streams/manage#googlesql) for more details, including additional configuration options and dialect-specific syntax. -## Create a service account +## 3. Create a service account To allow RDI to access the Spanner instance, you'll need to create a service account with the appropriate permissions. This service account will then be provided to RDI as a secret for @@ -108,11 +112,11 @@ gcloud iam service-accounts keys create ~/spanner-reader-account.json \ --project=YOUR_PROJECT_ID ``` -## Set up secrets for Kubernetes deployment +## 4. Set up secrets for Kubernetes deployment Before deploying the RDI pipeline, you need to configure the necessary secrets for both the source and target databases. Instructions for setting up the target database secrets are available in the -[RDI deployment guide](/integrate/redis-data-integration/data-pipelines/deploy#set-secrets-for-k8shelm-deployment-using-kubectl-command). +[RDI deployment guide]({{< relref "/integrate/redis-data-integration/data-pipelines/deploy#set-secrets-for-k8shelm-deployment-using-kubectl-command" >}}). In addition to the target database secrets, you'll also need to create a Spanner-specific secret named `source-db-credentials`. This secret should contain the service account key file generated @@ -127,7 +131,7 @@ kubectl create secret generic source-db-credentials --namespace=rdi \ Be sure to adjust the file path (`~/spanner-reader-account.json`) if your service account key is stored elsewhere. -## Configure RDI for Spanner +## 5. Configure RDI for Spanner When configuring your RDI pipeline for Spanner, use the following example configuration in your `config.yaml` file: @@ -184,7 +188,7 @@ processors: Make sure to replace the relevant connection details with your own for both the Spanner and target Redis databases. -## Additional Kubernetes configuration +## 6. Additional Kubernetes configuration In your `rdi-values.yaml` file for Kubernetes deployment, make sure to configure the `dataPlane` section like this: @@ -202,10 +206,6 @@ operator: - hostname # Replace with your desired ingress hostname ``` -## Next steps - -After completing the Spanner preparation steps, you can proceed with: +## 7. Configuration is complete -1. [Installing RDI on Kubernetes](/integrate/redis-data-integration/installation/install-k8s) -2. [Deploying your RDI pipeline](/integrate/redis-data-integration/data-pipelines/deploy) -3. [Using Redis Insight to manage your RDI pipeline](/develop/tools/insight/rdi-connector) +Once you have followed the steps above, your Google Spanner database is ready for RDI to use. From 34fcdd08c3b686c277efabb09b62f9538878cde4 Mon Sep 17 00:00:00 2001 From: Zdravko Donev Date: Tue, 19 Aug 2025 15:19:46 +0300 Subject: [PATCH 4/6] Apply suggestions from code review Co-authored-by: Yaron Parasol Co-authored-by: David Dougherty --- .../data-pipelines/prepare-dbs/spanner.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/content/integrate/redis-data-integration/data-pipelines/prepare-dbs/spanner.md b/content/integrate/redis-data-integration/data-pipelines/prepare-dbs/spanner.md index 9bf10eac20..f40f34e448 100644 --- a/content/integrate/redis-data-integration/data-pipelines/prepare-dbs/spanner.md +++ b/content/integrate/redis-data-integration/data-pipelines/prepare-dbs/spanner.md @@ -19,7 +19,7 @@ weight: 2 Google Cloud Spanner requires specific configuration to enable change data capture (CDC) with RDI. RDI operates in two phases with Spanner: snapshot (initial sync) and streaming. During the snapshot phase, RDI uses the JDBC driver to connect directly to Spanner and read the current state of the -database. In the streaming phase, RDI uses Spanner's Change Streams to capture changes related to +database. In the streaming phase, RDI uses [Spanner's Change Streams](https://cloud.google.com/spanner/docs/change-streams) to capture changes related to the monitored schemas and tables. {{< note >}} @@ -27,17 +27,17 @@ Spanner is only supported with RDI deployed on Kubernetes/Helm. RDI VM mode does {{< /note >}} You must have the necessary privileges to manage the database schema and create service accounts -with the appropriate permissions, so that RDI can access the Spanner database. +with the appropriate permissions so that RDI can access the Spanner database. ## 1. Prepare for snapshot During the snapshot phase, RDI executes multiple transactions to capture data at an exact point in time that remains consistent across all queries. This is achieved using a Spanner feature called -[Timestamp bounds with Exact staleness](https://cloud.google.com/spanner/docs/timestamp-bounds#exact_staleness). +[Timestamp bounds with exact staleness](https://cloud.google.com/spanner/docs/timestamp-bounds#exact_staleness). This feature relies on the [version_retention_period](https://cloud.google.com/spanner/docs/reference/rest/v1/projects.instances.databases#Database.FIELDS.version_retention_period), -which is set to 1 hour by default. Depending on the database tier, the volume of data to be +which is set to one hour by default. Depending on the database tier, the volume of data to be ingested into RDI, and the load on the database, this setting may need to be increased. You can update it using [this method](https://cloud.google.com/spanner/docs/use-pitr#set-period). @@ -47,7 +47,7 @@ To enable streaming, you must create a change stream in Spanner at the database option `value_capture_type = 'NEW_ROW_AND_OLD_VALUES'` to capture both the previous and updated row values. -Be sure to specify only the tables you want to ingest from—and optionally, the specific columns +Be sure to specify only the tables you want to ingest from and, optionally, the specific columns you're interested in. Here's an example using Google SQL syntax: ```sql From 69d1a4da049119cad824d0fe6d41b9cc0ae2b49e Mon Sep 17 00:00:00 2001 From: Zdravko Donev Date: Tue, 19 Aug 2025 15:26:10 +0300 Subject: [PATCH 5/6] Address code review comments --- .../data-pipelines/prepare-dbs/spanner.md | 69 +++++++++---------- 1 file changed, 33 insertions(+), 36 deletions(-) diff --git a/content/integrate/redis-data-integration/data-pipelines/prepare-dbs/spanner.md b/content/integrate/redis-data-integration/data-pipelines/prepare-dbs/spanner.md index f40f34e448..a196e7ae37 100644 --- a/content/integrate/redis-data-integration/data-pipelines/prepare-dbs/spanner.md +++ b/content/integrate/redis-data-integration/data-pipelines/prepare-dbs/spanner.md @@ -26,9 +26,6 @@ the monitored schemas and tables. Spanner is only supported with RDI deployed on Kubernetes/Helm. RDI VM mode does not support Spanner as a source database. {{< /note >}} -You must have the necessary privileges to manage the database schema and create service accounts -with the appropriate permissions so that RDI can access the Spanner database. - ## 1. Prepare for snapshot During the snapshot phase, RDI executes multiple transactions to capture data at an exact point @@ -67,50 +64,50 @@ To allow RDI to access the Spanner instance, you'll need to create a service acc appropriate permissions. This service account will then be provided to RDI as a secret for authentication. -### Step 1: Create the service account +1. Create the service account -```bash -gcloud iam service-accounts create spanner-reader-account \ - --display-name="Spanner Reader Service Account" \ - --description="Service account for reading from Spanner databases" \ - --project=YOUR_PROJECT_ID -``` + ```bash + gcloud iam service-accounts create spanner-reader-account \ + --display-name="Spanner Reader Service Account" \ + --description="Service account for reading from Spanner databases" \ + --project=YOUR_PROJECT_ID + ``` -### Step 2: Grant required roles +1. Grant required roles -**Database Reader** (read access to Spanner data): + **Database Reader** (read access to Spanner data): -```bash -gcloud projects add-iam-policy-binding YOUR_PROJECT_ID \ - --member="serviceAccount:spanner-reader-account@YOUR_PROJECT_ID.iam.gserviceaccount.com" \ - --role="roles/spanner.databaseReader" -``` + ```bash + gcloud projects add-iam-policy-binding YOUR_PROJECT_ID \ + --member="serviceAccount:spanner-reader-account@YOUR_PROJECT_ID.iam.gserviceaccount.com" \ + --role="roles/spanner.databaseReader" + ``` -**Database User** (query execution and metadata access): + **Database User** (query execution and metadata access): -```bash -gcloud projects add-iam-policy-binding YOUR_PROJECT_ID \ - --member="serviceAccount:spanner-reader-account@YOUR_PROJECT_ID.iam.gserviceaccount.com" \ - --role="roles/spanner.databaseUser" -``` + ```bash + gcloud projects add-iam-policy-binding YOUR_PROJECT_ID \ + --member="serviceAccount:spanner-reader-account@YOUR_PROJECT_ID.iam.gserviceaccount.com" \ + --role="roles/spanner.databaseUser" + ``` -**Viewer** (viewing instance and database configuration): + **Viewer** (viewing instance and database configuration): -```bash -gcloud projects add-iam-policy-binding YOUR_PROJECT_ID \ - --member="serviceAccount:spanner-reader-account@YOUR_PROJECT_ID.iam.gserviceaccount.com" \ - --role="roles/spanner.viewer" -``` + ```bash + gcloud projects add-iam-policy-binding YOUR_PROJECT_ID \ + --member="serviceAccount:spanner-reader-account@YOUR_PROJECT_ID.iam.gserviceaccount.com" \ + --role="roles/spanner.viewer" + ``` -### Step 3: Download the service account key +1. Download the service account key -Save the credentials locally so they can be used later by RDI: + Save the credentials locally so they can be used later by RDI: -```bash -gcloud iam service-accounts keys create ~/spanner-reader-account.json \ - --iam-account=spanner-reader-account@YOUR_PROJECT_ID.iam.gserviceaccount.com \ - --project=YOUR_PROJECT_ID -``` + ```bash + gcloud iam service-accounts keys create ~/spanner-reader-account.json \ + --iam-account=spanner-reader-account@YOUR_PROJECT_ID.iam.gserviceaccount.com \ + --project=YOUR_PROJECT_ID + ``` ## 4. Set up secrets for Kubernetes deployment From 57da52d4c677378e3e08aa78d6196f93563244fc Mon Sep 17 00:00:00 2001 From: "David W. Dougherty" Date: Tue, 19 Aug 2025 09:45:42 -0700 Subject: [PATCH 6/6] Link to K8s installation doc --- .../redis-data-integration/installation/install-k8s.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/content/integrate/redis-data-integration/installation/install-k8s.md b/content/integrate/redis-data-integration/installation/install-k8s.md index 513717e481..9f68e0dc77 100644 --- a/content/integrate/redis-data-integration/installation/install-k8s.md +++ b/content/integrate/redis-data-integration/installation/install-k8s.md @@ -209,6 +209,10 @@ also use mTLS, you must set the client certificate and private key contents in --set-file connection.ssl.key= ``` +{{< note >}} +Please see [these docs]({{< relref "/integrate/redis-data-integration/data-pipelines/prepare-dbs/spanner#6-additional-kubernetes-configuration" >}}) if this RDI installation is for use with GCP Spanner. +{{< /note >}} + ## Check the installation To verify the status of the K8s deployment, run the following command: