Skip to content

Commit 7171e22

Browse files
RDSC-3972: Initial draft of the Spanner public docs support
1 parent f1fc6e9 commit 7171e22

File tree

2 files changed

+212
-0
lines changed

2 files changed

+212
-0
lines changed

content/embeds/rdi-supported-source-versions.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,5 +6,6 @@
66
| MySQL | 5.7, 8.0.x, 8.2 | 8.0.x | 8.0 |
77
| PostgreSQL | 10, 11, 12, 13, 14, 15, 16 | 11, 12, 13, 14, 15, 16 | 15 |
88
| SQL Server | 2017, 2019, 2022 | 2016, 2017, 2019, 2022 | 2019 |
9+
| Spanner | - | - | All versions |
910
| AlloyDB for PostgreSQL | 14.2, 15.7 | - | 14.2, 15.7 |
1011
| AWS Aurora/PostgreSQL | 15 | 15 | - |
Lines changed: 211 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,211 @@
1+
---
2+
Title: Prepare Spanner for RDI
3+
aliases: /integrate/redis-data-integration/ingest/data-pipelines/prepare-dbs/spanner/
4+
alwaysopen: false
5+
categories:
6+
- docs
7+
- integrate
8+
- rs
9+
- rdi
10+
description: Prepare Google Cloud Spanner databases to work with RDI
11+
group: di
12+
linkTitle: Prepare Spanner
13+
summary: Redis Data Integration keeps Redis in sync with the primary database in near
14+
real time.
15+
type: integration
16+
weight: 2
17+
---
18+
19+
Google Cloud Spanner requires specific configuration to enable change data capture (CDC) with RDI.
20+
RDI operates in two phases with Spanner: snapshot (initial sync) and streaming. During the snapshot
21+
phase, RDI uses the JDBC driver to connect directly to Spanner and read the current state of the
22+
database. In the streaming phase, RDI uses Spanner's Change Streams to capture changes related to
23+
the monitored schemas and tables.
24+
25+
You must have the necessary privileges to manage the database schema and create service accounts
26+
with the appropriate permissions, so that RDI can access the Spanner database.
27+
28+
## Prepare for snapshot
29+
30+
During the snapshot phase, RDI executes multiple transactions to capture data at an exact point
31+
in time that remains consistent across all queries. This is achieved using a Spanner feature called
32+
[Timestamp bounds with Exact staleness](https://cloud.google.com/spanner/docs/timestamp-bounds#exact_staleness).
33+
34+
This feature relies on the
35+
[version_retention_period](https://cloud.google.com/spanner/docs/reference/rest/v1/projects.instances.databases#Database.FIELDS.version_retention_period),
36+
which is set to 1 hour by default. Depending on the database tier, the volume of data to be
37+
ingested into RDI, and the load on the database, this setting may need to be increased. You can
38+
update it using [this method](https://cloud.google.com/spanner/docs/use-pitr#set-period).
39+
40+
## Prepare for streaming
41+
42+
To enable streaming, you must create a change stream in Spanner at the database level. Use the
43+
option `value_capture_type = 'NEW_ROW_AND_OLD_VALUES'` to capture both the previous and updated
44+
row values.
45+
46+
Be sure to specify only the tables you want to ingest from—and optionally, the specific columns
47+
you're interested in. Here's an example using Google SQL syntax:
48+
49+
```sql
50+
CREATE CHANGE STREAM change_stream_table1_and_table2
51+
FOR table1, table2
52+
OPTIONS (
53+
value_capture_type = 'NEW_ROW_AND_OLD_VALUES'
54+
);
55+
```
56+
57+
Refer to the [official documentation](https://cloud.google.com/spanner/docs/change-streams/manage#googlesql)
58+
for more details, including additional configuration options and dialect-specific syntax.
59+
60+
## Create a service account
61+
62+
To allow RDI to access the Spanner instance, you'll need to create a service account with the
63+
appropriate permissions. This service account will then be provided to RDI as a secret for
64+
authentication.
65+
66+
### Step 1: Create the service account
67+
68+
```bash
69+
gcloud iam service-accounts create spanner-reader-account \
70+
--display-name="Spanner Reader Service Account" \
71+
--description="Service account for reading from Spanner databases" \
72+
--project=YOUR_PROJECT_ID
73+
```
74+
75+
### Step 2: Grant required roles
76+
77+
**Database Reader** (read access to Spanner data):
78+
79+
```bash
80+
gcloud projects add-iam-policy-binding YOUR_PROJECT_ID \
81+
--member="serviceAccount:spanner-reader-account@YOUR_PROJECT_ID.iam.gserviceaccount.com" \
82+
--role="roles/spanner.databaseReader"
83+
```
84+
85+
**Database User** (query execution and metadata access):
86+
87+
```bash
88+
gcloud projects add-iam-policy-binding YOUR_PROJECT_ID \
89+
--member="serviceAccount:spanner-reader-account@YOUR_PROJECT_ID.iam.gserviceaccount.com" \
90+
--role="roles/spanner.databaseUser"
91+
```
92+
93+
**Viewer** (viewing instance and database configuration):
94+
95+
```bash
96+
gcloud projects add-iam-policy-binding YOUR_PROJECT_ID \
97+
--member="serviceAccount:spanner-reader-account@YOUR_PROJECT_ID.iam.gserviceaccount.com" \
98+
--role="roles/spanner.viewer"
99+
```
100+
101+
### Step 3: Download the service account key
102+
103+
Save the credentials locally so they can be used later by RDI:
104+
105+
```bash
106+
gcloud iam service-accounts keys create ~/spanner-reader-account.json \
107+
--iam-account=spanner-reader-account@YOUR_PROJECT_ID.iam.gserviceaccount.com \
108+
--project=YOUR_PROJECT_ID
109+
```
110+
111+
## Set up secrets for Kubernetes deployment
112+
113+
Before deploying the RDI pipeline, you need to configure the necessary secrets for both the source
114+
and target databases. Instructions for setting up the target database secrets are available in the
115+
[RDI deployment guide](/integrate/redis-data-integration/data-pipelines/deploy#set-secrets-for-k8shelm-deployment-using-kubectl-command).
116+
117+
In addition to the target database secrets, you'll also need to create a Spanner-specific secret
118+
named `source-db-credentials`. This secret should contain the service account key file generated
119+
during the Spanner setup phase. Use the command below to create it:
120+
121+
```bash
122+
kubectl create secret generic source-db-credentials --namespace=rdi \
123+
--from-file=gcp-service-account.json=~/spanner-reader-account.json \
124+
--save-config --dry-run=client -o yaml | kubectl apply -f -
125+
```
126+
127+
Be sure to adjust the file path (`~/spanner-reader-account.json`) if your service account key is
128+
stored elsewhere.
129+
130+
## Configure RDI for Spanner
131+
132+
When configuring your RDI pipeline for Spanner, use the following example configuration in your
133+
`config.yaml` file:
134+
135+
```yaml
136+
sources:
137+
source:
138+
type: flink
139+
connection:
140+
type: spanner
141+
project_id: your-project-id
142+
instance_id: your-spanner-instance
143+
database_id: your-spanner-database
144+
change_streams:
145+
change_stream_all:
146+
{}
147+
# retention_hours: 24
148+
# schemas:
149+
# - DEFAULT
150+
# tables:
151+
# products: {}
152+
# orders: {}
153+
# order_items: {}
154+
# logging:
155+
# level: debug
156+
# advanced:
157+
# source:
158+
# spanner.change.stream.retention.hours: 24
159+
# spanner.fetch.timeout.milliseconds: 20000
160+
# spanner.dialect: POSTGRESQL
161+
# flink:
162+
# jobmanager.rpc.port: 7123
163+
# jobmanager.memory.process.size: 1024m
164+
# taskmanager.numberOfTaskSlots: 3
165+
# taskmanager.rpc.port: 7122
166+
# taskmanager.memory.process.size: 2g
167+
# blob.server.port: 7124
168+
# rest.port: 8082
169+
# parallelism.default: 4
170+
# restart-strategy.type: fixed-delay
171+
# restart-strategy.fixed-delay.attempts: 3
172+
targets:
173+
target:
174+
connection:
175+
type: redis
176+
host: ${HOST_IP}
177+
port: 12000
178+
user: ${TARGET_DB_USERNAME}
179+
password: ${TARGET_DB_PASSWORD}
180+
processors:
181+
target_data_type: hash
182+
```
183+
184+
Make sure to replace the relevant connection details with your own for both the Spanner and target
185+
Redis databases.
186+
187+
## Additional Kubernetes configuration
188+
189+
In your `rdi-values.yaml` file for Kubernetes deployment, make sure to configure the `dataPlane`
190+
section like this:
191+
192+
```yaml
193+
operator:
194+
dataPlane:
195+
flinkCollector:
196+
enabled: true
197+
jobManager:
198+
ingress:
199+
enabled: true
200+
className: traefik # Replace with your ingress controller
201+
hosts:
202+
- hostname # Replace with your Spanner DB hostname
203+
```
204+
205+
## Next steps
206+
207+
After completing the Spanner preparation steps, you can proceed with:
208+
209+
1. [Installing RDI on Kubernetes](/integrate/redis-data-integration/installation/install-k8s)
210+
2. [Deploying your RDI pipeline](/integrate/redis-data-integration/data-pipelines/deploy")
211+
3. [Using Redis Insight to manage your RDI pipeline](/develop/tools/insight/rdi-connector)

0 commit comments

Comments
 (0)