RDSC-3972: RDI now supports Google Spanner public docs (#2004)

ZdravkoDonev-redis · web-flow · commit fe80b2260f79 · 2025-08-20T02:23:41.000-07:00
diff --git a/content/embeds/rdi-supported-source-versions.md b/content/embeds/rdi-supported-source-versions.md
@@ -6,5 +6,6 @@
 | MySQL | 5.7, 8.0.x, 8.2 | 8.0.x | 8.0 |
 | PostgreSQL | 10, 11, 12, 13, 14, 15, 16  | 11, 12, 13, 14, 15, 16 | 15 |
 | SQL Server | 2017, 2019, 2022 | 2016, 2017, 2019, 2022 | 2019 |
+| Spanner | - | - | All versions |
 | AlloyDB for PostgreSQL | 14.2, 15.7 | - | 14.2, 15.7 |
 | AWS Aurora/PostgreSQL | 15 | 15 | - |
diff --git a/content/integrate/redis-data-integration/data-pipelines/prepare-dbs/spanner.md b/content/integrate/redis-data-integration/data-pipelines/prepare-dbs/spanner.md
@@ -0,0 +1,208 @@
+---
+Title: Prepare Spanner for RDI
+aliases: /integrate/redis-data-integration/ingest/data-pipelines/prepare-dbs/spanner/
+alwaysopen: false
+categories:
+- docs
+- integrate
+- rs
+- rdi
+description: Prepare Google Cloud Spanner databases to work with RDI
+group: di
+linkTitle: Prepare Spanner
+summary: Redis Data Integration keeps Redis in sync with the primary database in near
+  real time.
+type: integration
+weight: 2
+---
+
+Google Cloud Spanner requires specific configuration to enable change data capture (CDC) with RDI.
+RDI operates in two phases with Spanner: snapshot (initial sync) and streaming. During the snapshot
+phase, RDI uses the JDBC driver to connect directly to Spanner and read the current state of the
+database. In the streaming phase, RDI uses [Spanner's Change Streams](https://cloud.google.com/spanner/docs/change-streams) to capture changes related to
+the monitored schemas and tables.
+
+{{< note >}}
+Spanner is only supported with RDI deployed on Kubernetes/Helm. RDI VM mode does not support Spanner as a source database.
+{{< /note >}}
+
+## 1. Prepare for snapshot
+
+During the snapshot phase, RDI executes multiple transactions to capture data at an exact point 
+in time that remains consistent across all queries. This is achieved using a Spanner feature called 
+[Timestamp bounds with exact staleness](https://cloud.google.com/spanner/docs/timestamp-bounds#exact_staleness). 
+
+This feature relies on the 
+[version_retention_period](https://cloud.google.com/spanner/docs/reference/rest/v1/projects.instances.databases#Database.FIELDS.version_retention_period), 
+which is set to one hour by default. Depending on the database tier, the volume of data to be 
+ingested into RDI, and the load on the database, this setting may need to be increased. You can 
+update it using [this method](https://cloud.google.com/spanner/docs/use-pitr#set-period).
+
+## 2. Prepare for streaming
+
+To enable streaming, you must create a change stream in Spanner at the database level. Use the 
+option `value_capture_type = 'NEW_ROW_AND_OLD_VALUES'` to capture both the previous and updated 
+row values.
+
+Be sure to specify only the tables you want to ingest from and, optionally, the specific columns 
+you're interested in. Here's an example using Google SQL syntax:
+
+```sql
+CREATE CHANGE STREAM change_stream_table1_and_table2
+  FOR table1, table2
+  OPTIONS (
+    value_capture_type = 'NEW_ROW_AND_OLD_VALUES'
+  );
+```
+
+Refer to the [official documentation](https://cloud.google.com/spanner/docs/change-streams/manage#googlesql) 
+for more details, including additional configuration options and dialect-specific syntax.
+
+## 3. Create a service account
+
+To allow RDI to access the Spanner instance, you'll need to create a service account with the 
+appropriate permissions. This service account will then be provided to RDI as a secret for 
+authentication.
+
+1. Create the service account
+
+    ```bash
+    gcloud iam service-accounts create spanner-reader-account \
+        --display-name="Spanner Reader Service Account" \
+        --description="Service account for reading from Spanner databases" \
+        --project=YOUR_PROJECT_ID
+    ```
+
+1. Grant required roles
+
+    **Database Reader** (read access to Spanner data):
+
+    ```bash
+    gcloud projects add-iam-policy-binding YOUR_PROJECT_ID \
+        --member="serviceAccount:spanner-reader-account@YOUR_PROJECT_ID.iam.gserviceaccount.com" \
+        --role="roles/spanner.databaseReader"
+    ```
+
+    **Database User** (query execution and metadata access):
+
+    ```bash
+    gcloud projects add-iam-policy-binding YOUR_PROJECT_ID \
+        --member="serviceAccount:spanner-reader-account@YOUR_PROJECT_ID.iam.gserviceaccount.com" \
+        --role="roles/spanner.databaseUser"
+    ```
+
+    **Viewer** (viewing instance and database configuration):
+
+    ```bash
+    gcloud projects add-iam-policy-binding YOUR_PROJECT_ID \
+        --member="serviceAccount:spanner-reader-account@YOUR_PROJECT_ID.iam.gserviceaccount.com" \
+        --role="roles/spanner.viewer"
+    ```
+
+1. Download the service account key
+
+    Save the credentials locally so they can be used later by RDI:
+
+    ```bash
+    gcloud iam service-accounts keys create ~/spanner-reader-account.json \
+        --iam-account=spanner-reader-account@YOUR_PROJECT_ID.iam.gserviceaccount.com \
+        --project=YOUR_PROJECT_ID
+    ```
+
+## 4. Set up secrets for Kubernetes deployment
+
+Before deploying the RDI pipeline, you need to configure the necessary secrets for both the source 
+and target databases. Instructions for setting up the target database secrets are available in the 
+[RDI deployment guide]({{< relref "/integrate/redis-data-integration/data-pipelines/deploy#set-secrets-for-k8shelm-deployment-using-kubectl-command" >}}).
+
+In addition to the target database secrets, you'll also need to create a Spanner-specific secret 
+named `source-db-credentials`. This secret should contain the service account key file generated 
+during the Spanner setup phase. Use the command below to create it:
+
+```bash
+kubectl create secret generic source-db-credentials --namespace=rdi \
+--from-file=gcp-service-account.json=~/spanner-reader-account.json \
+--save-config --dry-run=client -o yaml | kubectl apply -f -
+```
+
+Be sure to adjust the file path (`~/spanner-reader-account.json`) if your service account key is 
+stored elsewhere.
+
+## 5. Configure RDI for Spanner
+
+When configuring your RDI pipeline for Spanner, use the following example configuration in your 
+`config.yaml` file:
+
+```yaml
+sources:
+  source:
+    type: flink
+    connection:
+      type: spanner
+      project_id: your-project-id
+      instance_id: your-spanner-instance
+      database_id: your-spanner-database
+      change_streams:
+        change_stream_all:
+          {}
+          # retention_hours: 24
+    # schemas:
+    #  - DEFAULT
+    # tables:
+    #   products: {}
+    #   orders: {}
+    #   order_items: {}
+    # logging:
+    #   level: debug
+    # advanced:
+    #   source:
+    #     spanner.change.stream.retention.hours: 24
+    #     spanner.fetch.timeout.milliseconds: 20000
+    #     spanner.dialect: POSTGRESQL
+    #   flink:
+    #     jobmanager.rpc.port: 7123
+    #     jobmanager.memory.process.size: 1024m
+    #     taskmanager.numberOfTaskSlots: 3
+    #     taskmanager.rpc.port: 7122
+    #     taskmanager.memory.process.size: 2g
+    #     blob.server.port: 7124
+    #     rest.port: 8082
+    #     parallelism.default: 4
+    #     restart-strategy.type: fixed-delay
+    #     restart-strategy.fixed-delay.attempts: 3
+targets:
+  target:
+    connection:
+      type: redis
+      host: ${HOST_IP}
+      port: 12000
+      user: ${TARGET_DB_USERNAME}
+      password: ${TARGET_DB_PASSWORD}
+processors:
+  target_data_type: hash
+```
+
+Make sure to replace the relevant connection details with your own for both the Spanner and target 
+Redis databases.
+
+## 6. Additional Kubernetes configuration
+
+In your `rdi-values.yaml` file for Kubernetes deployment, make sure to configure the `dataPlane` 
+section like this:
+
+```yaml
+operator:
+  dataPlane:
+    flinkCollector:
+      enabled: true
+      jobManager:
+        ingress:
+          enabled: true
+          className: traefik # Replace with your ingress controller
+          hosts:
+            - hostname # Replace with your desired ingress hostname
+```
+
+## 7. Configuration is complete
+
+Once you have followed the steps above, your Google Spanner database is ready for RDI to use.
diff --git a/content/integrate/redis-data-integration/installation/install-k8s.md b/content/integrate/redis-data-integration/installation/install-k8s.md
@@ -209,6 +209,10 @@ also use mTLS, you must set the client certificate and private key contents in
       --set-file connection.ssl.key=<path-to-client-key>
     ```
 
+{{< note >}}
+Please see [these docs]({{< relref "/integrate/redis-data-integration/data-pipelines/prepare-dbs/spanner#6-additional-kubernetes-configuration" >}}) if this RDI installation is for use with GCP Spanner.
+{{< /note >}}
+
 ## Check the installation
 
 To verify the status of the K8s deployment, run the following command: