Skip to content

Commit fe80b22

Browse files
RDSC-3972: RDI now supports Google Spanner public docs (#2004)
1 parent 8a9554b commit fe80b22

File tree

3 files changed

+213
-0
lines changed

3 files changed

+213
-0
lines changed

content/embeds/rdi-supported-source-versions.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,5 +6,6 @@
66
| MySQL | 5.7, 8.0.x, 8.2 | 8.0.x | 8.0 |
77
| PostgreSQL | 10, 11, 12, 13, 14, 15, 16 | 11, 12, 13, 14, 15, 16 | 15 |
88
| SQL Server | 2017, 2019, 2022 | 2016, 2017, 2019, 2022 | 2019 |
9+
| Spanner | - | - | All versions |
910
| AlloyDB for PostgreSQL | 14.2, 15.7 | - | 14.2, 15.7 |
1011
| AWS Aurora/PostgreSQL | 15 | 15 | - |
Lines changed: 208 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,208 @@
1+
---
2+
Title: Prepare Spanner for RDI
3+
aliases: /integrate/redis-data-integration/ingest/data-pipelines/prepare-dbs/spanner/
4+
alwaysopen: false
5+
categories:
6+
- docs
7+
- integrate
8+
- rs
9+
- rdi
10+
description: Prepare Google Cloud Spanner databases to work with RDI
11+
group: di
12+
linkTitle: Prepare Spanner
13+
summary: Redis Data Integration keeps Redis in sync with the primary database in near
14+
real time.
15+
type: integration
16+
weight: 2
17+
---
18+
19+
Google Cloud Spanner requires specific configuration to enable change data capture (CDC) with RDI.
20+
RDI operates in two phases with Spanner: snapshot (initial sync) and streaming. During the snapshot
21+
phase, RDI uses the JDBC driver to connect directly to Spanner and read the current state of the
22+
database. In the streaming phase, RDI uses [Spanner's Change Streams](https://cloud.google.com/spanner/docs/change-streams) to capture changes related to
23+
the monitored schemas and tables.
24+
25+
{{< note >}}
26+
Spanner is only supported with RDI deployed on Kubernetes/Helm. RDI VM mode does not support Spanner as a source database.
27+
{{< /note >}}
28+
29+
## 1. Prepare for snapshot
30+
31+
During the snapshot phase, RDI executes multiple transactions to capture data at an exact point
32+
in time that remains consistent across all queries. This is achieved using a Spanner feature called
33+
[Timestamp bounds with exact staleness](https://cloud.google.com/spanner/docs/timestamp-bounds#exact_staleness).
34+
35+
This feature relies on the
36+
[version_retention_period](https://cloud.google.com/spanner/docs/reference/rest/v1/projects.instances.databases#Database.FIELDS.version_retention_period),
37+
which is set to one hour by default. Depending on the database tier, the volume of data to be
38+
ingested into RDI, and the load on the database, this setting may need to be increased. You can
39+
update it using [this method](https://cloud.google.com/spanner/docs/use-pitr#set-period).
40+
41+
## 2. Prepare for streaming
42+
43+
To enable streaming, you must create a change stream in Spanner at the database level. Use the
44+
option `value_capture_type = 'NEW_ROW_AND_OLD_VALUES'` to capture both the previous and updated
45+
row values.
46+
47+
Be sure to specify only the tables you want to ingest from and, optionally, the specific columns
48+
you're interested in. Here's an example using Google SQL syntax:
49+
50+
```sql
51+
CREATE CHANGE STREAM change_stream_table1_and_table2
52+
FOR table1, table2
53+
OPTIONS (
54+
value_capture_type = 'NEW_ROW_AND_OLD_VALUES'
55+
);
56+
```
57+
58+
Refer to the [official documentation](https://cloud.google.com/spanner/docs/change-streams/manage#googlesql)
59+
for more details, including additional configuration options and dialect-specific syntax.
60+
61+
## 3. Create a service account
62+
63+
To allow RDI to access the Spanner instance, you'll need to create a service account with the
64+
appropriate permissions. This service account will then be provided to RDI as a secret for
65+
authentication.
66+
67+
1. Create the service account
68+
69+
```bash
70+
gcloud iam service-accounts create spanner-reader-account \
71+
--display-name="Spanner Reader Service Account" \
72+
--description="Service account for reading from Spanner databases" \
73+
--project=YOUR_PROJECT_ID
74+
```
75+
76+
1. Grant required roles
77+
78+
**Database Reader** (read access to Spanner data):
79+
80+
```bash
81+
gcloud projects add-iam-policy-binding YOUR_PROJECT_ID \
82+
--member="serviceAccount:spanner-reader-account@YOUR_PROJECT_ID.iam.gserviceaccount.com" \
83+
--role="roles/spanner.databaseReader"
84+
```
85+
86+
**Database User** (query execution and metadata access):
87+
88+
```bash
89+
gcloud projects add-iam-policy-binding YOUR_PROJECT_ID \
90+
--member="serviceAccount:spanner-reader-account@YOUR_PROJECT_ID.iam.gserviceaccount.com" \
91+
--role="roles/spanner.databaseUser"
92+
```
93+
94+
**Viewer** (viewing instance and database configuration):
95+
96+
```bash
97+
gcloud projects add-iam-policy-binding YOUR_PROJECT_ID \
98+
--member="serviceAccount:spanner-reader-account@YOUR_PROJECT_ID.iam.gserviceaccount.com" \
99+
--role="roles/spanner.viewer"
100+
```
101+
102+
1. Download the service account key
103+
104+
Save the credentials locally so they can be used later by RDI:
105+
106+
```bash
107+
gcloud iam service-accounts keys create ~/spanner-reader-account.json \
108+
--iam-account=spanner-reader-account@YOUR_PROJECT_ID.iam.gserviceaccount.com \
109+
--project=YOUR_PROJECT_ID
110+
```
111+
112+
## 4. Set up secrets for Kubernetes deployment
113+
114+
Before deploying the RDI pipeline, you need to configure the necessary secrets for both the source
115+
and target databases. Instructions for setting up the target database secrets are available in the
116+
[RDI deployment guide]({{< relref "/integrate/redis-data-integration/data-pipelines/deploy#set-secrets-for-k8shelm-deployment-using-kubectl-command" >}}).
117+
118+
In addition to the target database secrets, you'll also need to create a Spanner-specific secret
119+
named `source-db-credentials`. This secret should contain the service account key file generated
120+
during the Spanner setup phase. Use the command below to create it:
121+
122+
```bash
123+
kubectl create secret generic source-db-credentials --namespace=rdi \
124+
--from-file=gcp-service-account.json=~/spanner-reader-account.json \
125+
--save-config --dry-run=client -o yaml | kubectl apply -f -
126+
```
127+
128+
Be sure to adjust the file path (`~/spanner-reader-account.json`) if your service account key is
129+
stored elsewhere.
130+
131+
## 5. Configure RDI for Spanner
132+
133+
When configuring your RDI pipeline for Spanner, use the following example configuration in your
134+
`config.yaml` file:
135+
136+
```yaml
137+
sources:
138+
source:
139+
type: flink
140+
connection:
141+
type: spanner
142+
project_id: your-project-id
143+
instance_id: your-spanner-instance
144+
database_id: your-spanner-database
145+
change_streams:
146+
change_stream_all:
147+
{}
148+
# retention_hours: 24
149+
# schemas:
150+
# - DEFAULT
151+
# tables:
152+
# products: {}
153+
# orders: {}
154+
# order_items: {}
155+
# logging:
156+
# level: debug
157+
# advanced:
158+
# source:
159+
# spanner.change.stream.retention.hours: 24
160+
# spanner.fetch.timeout.milliseconds: 20000
161+
# spanner.dialect: POSTGRESQL
162+
# flink:
163+
# jobmanager.rpc.port: 7123
164+
# jobmanager.memory.process.size: 1024m
165+
# taskmanager.numberOfTaskSlots: 3
166+
# taskmanager.rpc.port: 7122
167+
# taskmanager.memory.process.size: 2g
168+
# blob.server.port: 7124
169+
# rest.port: 8082
170+
# parallelism.default: 4
171+
# restart-strategy.type: fixed-delay
172+
# restart-strategy.fixed-delay.attempts: 3
173+
targets:
174+
target:
175+
connection:
176+
type: redis
177+
host: ${HOST_IP}
178+
port: 12000
179+
user: ${TARGET_DB_USERNAME}
180+
password: ${TARGET_DB_PASSWORD}
181+
processors:
182+
target_data_type: hash
183+
```
184+
185+
Make sure to replace the relevant connection details with your own for both the Spanner and target
186+
Redis databases.
187+
188+
## 6. Additional Kubernetes configuration
189+
190+
In your `rdi-values.yaml` file for Kubernetes deployment, make sure to configure the `dataPlane`
191+
section like this:
192+
193+
```yaml
194+
operator:
195+
dataPlane:
196+
flinkCollector:
197+
enabled: true
198+
jobManager:
199+
ingress:
200+
enabled: true
201+
className: traefik # Replace with your ingress controller
202+
hosts:
203+
- hostname # Replace with your desired ingress hostname
204+
```
205+
206+
## 7. Configuration is complete
207+
208+
Once you have followed the steps above, your Google Spanner database is ready for RDI to use.

content/integrate/redis-data-integration/installation/install-k8s.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -209,6 +209,10 @@ also use mTLS, you must set the client certificate and private key contents in
209209
--set-file connection.ssl.key=<path-to-client-key>
210210
```
211211

212+
{{< note >}}
213+
Please see [these docs]({{< relref "/integrate/redis-data-integration/data-pipelines/prepare-dbs/spanner#6-additional-kubernetes-configuration" >}}) if this RDI installation is for use with GCP Spanner.
214+
{{< /note >}}
215+
212216
## Check the installation
213217

214218
To verify the status of the K8s deployment, run the following command:

0 commit comments

Comments
 (0)