|
| 1 | +--- |
| 2 | +Title: Prepare Spanner for RDI |
| 3 | +aliases: /integrate/redis-data-integration/ingest/data-pipelines/prepare-dbs/spanner/ |
| 4 | +alwaysopen: false |
| 5 | +categories: |
| 6 | +- docs |
| 7 | +- integrate |
| 8 | +- rs |
| 9 | +- rdi |
| 10 | +description: Prepare Google Cloud Spanner databases to work with RDI |
| 11 | +group: di |
| 12 | +linkTitle: Prepare Spanner |
| 13 | +summary: Redis Data Integration keeps Redis in sync with the primary database in near |
| 14 | + real time. |
| 15 | +type: integration |
| 16 | +weight: 2 |
| 17 | +--- |
| 18 | + |
| 19 | +Google Cloud Spanner requires specific configuration to enable change data capture (CDC) with RDI. |
| 20 | +RDI operates in two phases with Spanner: snapshot (initial sync) and streaming. During the snapshot |
| 21 | +phase, RDI uses the JDBC driver to connect directly to Spanner and read the current state of the |
| 22 | +database. In the streaming phase, RDI uses [Spanner's Change Streams](https://cloud.google.com/spanner/docs/change-streams) to capture changes related to |
| 23 | +the monitored schemas and tables. |
| 24 | + |
| 25 | +{{< note >}} |
| 26 | +Spanner is only supported with RDI deployed on Kubernetes/Helm. RDI VM mode does not support Spanner as a source database. |
| 27 | +{{< /note >}} |
| 28 | + |
| 29 | +## 1. Prepare for snapshot |
| 30 | + |
| 31 | +During the snapshot phase, RDI executes multiple transactions to capture data at an exact point |
| 32 | +in time that remains consistent across all queries. This is achieved using a Spanner feature called |
| 33 | +[Timestamp bounds with exact staleness](https://cloud.google.com/spanner/docs/timestamp-bounds#exact_staleness). |
| 34 | + |
| 35 | +This feature relies on the |
| 36 | +[version_retention_period](https://cloud.google.com/spanner/docs/reference/rest/v1/projects.instances.databases#Database.FIELDS.version_retention_period), |
| 37 | +which is set to one hour by default. Depending on the database tier, the volume of data to be |
| 38 | +ingested into RDI, and the load on the database, this setting may need to be increased. You can |
| 39 | +update it using [this method](https://cloud.google.com/spanner/docs/use-pitr#set-period). |
| 40 | + |
| 41 | +## 2. Prepare for streaming |
| 42 | + |
| 43 | +To enable streaming, you must create a change stream in Spanner at the database level. Use the |
| 44 | +option `value_capture_type = 'NEW_ROW_AND_OLD_VALUES'` to capture both the previous and updated |
| 45 | +row values. |
| 46 | + |
| 47 | +Be sure to specify only the tables you want to ingest from and, optionally, the specific columns |
| 48 | +you're interested in. Here's an example using Google SQL syntax: |
| 49 | + |
| 50 | +```sql |
| 51 | +CREATE CHANGE STREAM change_stream_table1_and_table2 |
| 52 | + FOR table1, table2 |
| 53 | + OPTIONS ( |
| 54 | + value_capture_type = 'NEW_ROW_AND_OLD_VALUES' |
| 55 | + ); |
| 56 | +``` |
| 57 | + |
| 58 | +Refer to the [official documentation](https://cloud.google.com/spanner/docs/change-streams/manage#googlesql) |
| 59 | +for more details, including additional configuration options and dialect-specific syntax. |
| 60 | + |
| 61 | +## 3. Create a service account |
| 62 | + |
| 63 | +To allow RDI to access the Spanner instance, you'll need to create a service account with the |
| 64 | +appropriate permissions. This service account will then be provided to RDI as a secret for |
| 65 | +authentication. |
| 66 | + |
| 67 | +1. Create the service account |
| 68 | + |
| 69 | + ```bash |
| 70 | + gcloud iam service-accounts create spanner-reader-account \ |
| 71 | + --display-name="Spanner Reader Service Account" \ |
| 72 | + --description="Service account for reading from Spanner databases" \ |
| 73 | + --project=YOUR_PROJECT_ID |
| 74 | + ``` |
| 75 | + |
| 76 | +1. Grant required roles |
| 77 | + |
| 78 | + **Database Reader** (read access to Spanner data): |
| 79 | + |
| 80 | + ```bash |
| 81 | + gcloud projects add-iam-policy-binding YOUR_PROJECT_ID \ |
| 82 | + --member="serviceAccount:spanner-reader-account@YOUR_PROJECT_ID.iam.gserviceaccount.com" \ |
| 83 | + --role="roles/spanner.databaseReader" |
| 84 | + ``` |
| 85 | + |
| 86 | + **Database User** (query execution and metadata access): |
| 87 | + |
| 88 | + ```bash |
| 89 | + gcloud projects add-iam-policy-binding YOUR_PROJECT_ID \ |
| 90 | + --member="serviceAccount:spanner-reader-account@YOUR_PROJECT_ID.iam.gserviceaccount.com" \ |
| 91 | + --role="roles/spanner.databaseUser" |
| 92 | + ``` |
| 93 | + |
| 94 | + **Viewer** (viewing instance and database configuration): |
| 95 | + |
| 96 | + ```bash |
| 97 | + gcloud projects add-iam-policy-binding YOUR_PROJECT_ID \ |
| 98 | + --member="serviceAccount:spanner-reader-account@YOUR_PROJECT_ID.iam.gserviceaccount.com" \ |
| 99 | + --role="roles/spanner.viewer" |
| 100 | + ``` |
| 101 | + |
| 102 | +1. Download the service account key |
| 103 | + |
| 104 | + Save the credentials locally so they can be used later by RDI: |
| 105 | + |
| 106 | + ```bash |
| 107 | + gcloud iam service-accounts keys create ~/spanner-reader-account.json \ |
| 108 | + --iam-account=spanner-reader-account@YOUR_PROJECT_ID.iam.gserviceaccount.com \ |
| 109 | + --project=YOUR_PROJECT_ID |
| 110 | + ``` |
| 111 | + |
| 112 | +## 4. Set up secrets for Kubernetes deployment |
| 113 | + |
| 114 | +Before deploying the RDI pipeline, you need to configure the necessary secrets for both the source |
| 115 | +and target databases. Instructions for setting up the target database secrets are available in the |
| 116 | +[RDI deployment guide]({{< relref "/integrate/redis-data-integration/data-pipelines/deploy#set-secrets-for-k8shelm-deployment-using-kubectl-command" >}}). |
| 117 | + |
| 118 | +In addition to the target database secrets, you'll also need to create a Spanner-specific secret |
| 119 | +named `source-db-credentials`. This secret should contain the service account key file generated |
| 120 | +during the Spanner setup phase. Use the command below to create it: |
| 121 | +
|
| 122 | +```bash |
| 123 | +kubectl create secret generic source-db-credentials --namespace=rdi \ |
| 124 | +--from-file=gcp-service-account.json=~/spanner-reader-account.json \ |
| 125 | +--save-config --dry-run=client -o yaml | kubectl apply -f - |
| 126 | +``` |
| 127 | +
|
| 128 | +Be sure to adjust the file path (`~/spanner-reader-account.json`) if your service account key is |
| 129 | +stored elsewhere. |
| 130 | +
|
| 131 | +## 5. Configure RDI for Spanner |
| 132 | +
|
| 133 | +When configuring your RDI pipeline for Spanner, use the following example configuration in your |
| 134 | +`config.yaml` file: |
| 135 | +
|
| 136 | +```yaml |
| 137 | +sources: |
| 138 | + source: |
| 139 | + type: flink |
| 140 | + connection: |
| 141 | + type: spanner |
| 142 | + project_id: your-project-id |
| 143 | + instance_id: your-spanner-instance |
| 144 | + database_id: your-spanner-database |
| 145 | + change_streams: |
| 146 | + change_stream_all: |
| 147 | + {} |
| 148 | + # retention_hours: 24 |
| 149 | + # schemas: |
| 150 | + # - DEFAULT |
| 151 | + # tables: |
| 152 | + # products: {} |
| 153 | + # orders: {} |
| 154 | + # order_items: {} |
| 155 | + # logging: |
| 156 | + # level: debug |
| 157 | + # advanced: |
| 158 | + # source: |
| 159 | + # spanner.change.stream.retention.hours: 24 |
| 160 | + # spanner.fetch.timeout.milliseconds: 20000 |
| 161 | + # spanner.dialect: POSTGRESQL |
| 162 | + # flink: |
| 163 | + # jobmanager.rpc.port: 7123 |
| 164 | + # jobmanager.memory.process.size: 1024m |
| 165 | + # taskmanager.numberOfTaskSlots: 3 |
| 166 | + # taskmanager.rpc.port: 7122 |
| 167 | + # taskmanager.memory.process.size: 2g |
| 168 | + # blob.server.port: 7124 |
| 169 | + # rest.port: 8082 |
| 170 | + # parallelism.default: 4 |
| 171 | + # restart-strategy.type: fixed-delay |
| 172 | + # restart-strategy.fixed-delay.attempts: 3 |
| 173 | +targets: |
| 174 | + target: |
| 175 | + connection: |
| 176 | + type: redis |
| 177 | + host: ${HOST_IP} |
| 178 | + port: 12000 |
| 179 | + user: ${TARGET_DB_USERNAME} |
| 180 | + password: ${TARGET_DB_PASSWORD} |
| 181 | +processors: |
| 182 | + target_data_type: hash |
| 183 | +``` |
| 184 | +
|
| 185 | +Make sure to replace the relevant connection details with your own for both the Spanner and target |
| 186 | +Redis databases. |
| 187 | +
|
| 188 | +## 6. Additional Kubernetes configuration |
| 189 | +
|
| 190 | +In your `rdi-values.yaml` file for Kubernetes deployment, make sure to configure the `dataPlane` |
| 191 | +section like this: |
| 192 | +
|
| 193 | +```yaml |
| 194 | +operator: |
| 195 | + dataPlane: |
| 196 | + flinkCollector: |
| 197 | + enabled: true |
| 198 | + jobManager: |
| 199 | + ingress: |
| 200 | + enabled: true |
| 201 | + className: traefik # Replace with your ingress controller |
| 202 | + hosts: |
| 203 | + - hostname # Replace with your desired ingress hostname |
| 204 | +``` |
| 205 | +
|
| 206 | +## 7. Configuration is complete |
| 207 | +
|
| 208 | +Once you have followed the steps above, your Google Spanner database is ready for RDI to use. |
0 commit comments