-
Notifications
You must be signed in to change notification settings - Fork 270
RDSC-3972: RDI now supports Google Spanner public docs #2004
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 5 commits
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
7171e22
RDSC-3972: Initial draft of the Spanner public docs support
ZdravkoDonev-redis 4546a71
Apply suggestions from code review Copilot
ZdravkoDonev-redis 11f558c
Add numbering and fix links
ZdravkoDonev-redis 34fcdd0
Apply suggestions from code review
ZdravkoDonev-redis 69d1a4d
Address code review comments
ZdravkoDonev-redis 57da52d
Link to K8s installation doc
dwdougherty File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
208 changes: 208 additions & 0 deletions
208
content/integrate/redis-data-integration/data-pipelines/prepare-dbs/spanner.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,208 @@ | ||
| --- | ||
| Title: Prepare Spanner for RDI | ||
| aliases: /integrate/redis-data-integration/ingest/data-pipelines/prepare-dbs/spanner/ | ||
| alwaysopen: false | ||
| categories: | ||
| - docs | ||
| - integrate | ||
| - rs | ||
| - rdi | ||
| description: Prepare Google Cloud Spanner databases to work with RDI | ||
| group: di | ||
| linkTitle: Prepare Spanner | ||
| summary: Redis Data Integration keeps Redis in sync with the primary database in near | ||
| real time. | ||
| type: integration | ||
| weight: 2 | ||
| --- | ||
|
|
||
| Google Cloud Spanner requires specific configuration to enable change data capture (CDC) with RDI. | ||
| RDI operates in two phases with Spanner: snapshot (initial sync) and streaming. During the snapshot | ||
| phase, RDI uses the JDBC driver to connect directly to Spanner and read the current state of the | ||
| database. In the streaming phase, RDI uses [Spanner's Change Streams](https://cloud.google.com/spanner/docs/change-streams) to capture changes related to | ||
| the monitored schemas and tables. | ||
|
|
||
| {{< note >}} | ||
| Spanner is only supported with RDI deployed on Kubernetes/Helm. RDI VM mode does not support Spanner as a source database. | ||
| {{< /note >}} | ||
|
|
||
| ## 1. Prepare for snapshot | ||
|
|
||
| During the snapshot phase, RDI executes multiple transactions to capture data at an exact point | ||
| in time that remains consistent across all queries. This is achieved using a Spanner feature called | ||
| [Timestamp bounds with exact staleness](https://cloud.google.com/spanner/docs/timestamp-bounds#exact_staleness). | ||
|
|
||
| This feature relies on the | ||
| [version_retention_period](https://cloud.google.com/spanner/docs/reference/rest/v1/projects.instances.databases#Database.FIELDS.version_retention_period), | ||
| which is set to one hour by default. Depending on the database tier, the volume of data to be | ||
| ingested into RDI, and the load on the database, this setting may need to be increased. You can | ||
| update it using [this method](https://cloud.google.com/spanner/docs/use-pitr#set-period). | ||
|
|
||
| ## 2. Prepare for streaming | ||
|
|
||
| To enable streaming, you must create a change stream in Spanner at the database level. Use the | ||
| option `value_capture_type = 'NEW_ROW_AND_OLD_VALUES'` to capture both the previous and updated | ||
| row values. | ||
|
|
||
| Be sure to specify only the tables you want to ingest from and, optionally, the specific columns | ||
| you're interested in. Here's an example using Google SQL syntax: | ||
|
|
||
| ```sql | ||
| CREATE CHANGE STREAM change_stream_table1_and_table2 | ||
| FOR table1, table2 | ||
| OPTIONS ( | ||
| value_capture_type = 'NEW_ROW_AND_OLD_VALUES' | ||
| ); | ||
| ``` | ||
|
|
||
| Refer to the [official documentation](https://cloud.google.com/spanner/docs/change-streams/manage#googlesql) | ||
| for more details, including additional configuration options and dialect-specific syntax. | ||
|
|
||
| ## 3. Create a service account | ||
|
|
||
| To allow RDI to access the Spanner instance, you'll need to create a service account with the | ||
| appropriate permissions. This service account will then be provided to RDI as a secret for | ||
| authentication. | ||
|
|
||
| 1. Create the service account | ||
|
|
||
| ```bash | ||
| gcloud iam service-accounts create spanner-reader-account \ | ||
| --display-name="Spanner Reader Service Account" \ | ||
| --description="Service account for reading from Spanner databases" \ | ||
| --project=YOUR_PROJECT_ID | ||
| ``` | ||
|
|
||
| 1. Grant required roles | ||
|
|
||
| **Database Reader** (read access to Spanner data): | ||
|
|
||
| ```bash | ||
| gcloud projects add-iam-policy-binding YOUR_PROJECT_ID \ | ||
| --member="serviceAccount:spanner-reader-account@YOUR_PROJECT_ID.iam.gserviceaccount.com" \ | ||
| --role="roles/spanner.databaseReader" | ||
| ``` | ||
|
|
||
| **Database User** (query execution and metadata access): | ||
|
|
||
| ```bash | ||
| gcloud projects add-iam-policy-binding YOUR_PROJECT_ID \ | ||
| --member="serviceAccount:spanner-reader-account@YOUR_PROJECT_ID.iam.gserviceaccount.com" \ | ||
| --role="roles/spanner.databaseUser" | ||
| ``` | ||
|
|
||
| **Viewer** (viewing instance and database configuration): | ||
|
|
||
| ```bash | ||
| gcloud projects add-iam-policy-binding YOUR_PROJECT_ID \ | ||
| --member="serviceAccount:spanner-reader-account@YOUR_PROJECT_ID.iam.gserviceaccount.com" \ | ||
| --role="roles/spanner.viewer" | ||
| ``` | ||
|
|
||
| 1. Download the service account key | ||
|
|
||
| Save the credentials locally so they can be used later by RDI: | ||
|
|
||
| ```bash | ||
| gcloud iam service-accounts keys create ~/spanner-reader-account.json \ | ||
| --iam-account=spanner-reader-account@YOUR_PROJECT_ID.iam.gserviceaccount.com \ | ||
| --project=YOUR_PROJECT_ID | ||
| ``` | ||
|
|
||
| ## 4. Set up secrets for Kubernetes deployment | ||
|
|
||
| Before deploying the RDI pipeline, you need to configure the necessary secrets for both the source | ||
| and target databases. Instructions for setting up the target database secrets are available in the | ||
| [RDI deployment guide]({{< relref "/integrate/redis-data-integration/data-pipelines/deploy#set-secrets-for-k8shelm-deployment-using-kubectl-command" >}}). | ||
|
|
||
| In addition to the target database secrets, you'll also need to create a Spanner-specific secret | ||
| named `source-db-credentials`. This secret should contain the service account key file generated | ||
| during the Spanner setup phase. Use the command below to create it: | ||
|
|
||
| ```bash | ||
| kubectl create secret generic source-db-credentials --namespace=rdi \ | ||
| --from-file=gcp-service-account.json=~/spanner-reader-account.json \ | ||
| --save-config --dry-run=client -o yaml | kubectl apply -f - | ||
| ``` | ||
|
|
||
| Be sure to adjust the file path (`~/spanner-reader-account.json`) if your service account key is | ||
| stored elsewhere. | ||
|
|
||
| ## 5. Configure RDI for Spanner | ||
|
|
||
| When configuring your RDI pipeline for Spanner, use the following example configuration in your | ||
| `config.yaml` file: | ||
|
|
||
| ```yaml | ||
| sources: | ||
| source: | ||
| type: flink | ||
| connection: | ||
| type: spanner | ||
| project_id: your-project-id | ||
| instance_id: your-spanner-instance | ||
| database_id: your-spanner-database | ||
| change_streams: | ||
| change_stream_all: | ||
| {} | ||
| # retention_hours: 24 | ||
| # schemas: | ||
| # - DEFAULT | ||
| # tables: | ||
| # products: {} | ||
| # orders: {} | ||
| # order_items: {} | ||
| # logging: | ||
| # level: debug | ||
| # advanced: | ||
| # source: | ||
| # spanner.change.stream.retention.hours: 24 | ||
| # spanner.fetch.timeout.milliseconds: 20000 | ||
| # spanner.dialect: POSTGRESQL | ||
| # flink: | ||
| # jobmanager.rpc.port: 7123 | ||
| # jobmanager.memory.process.size: 1024m | ||
| # taskmanager.numberOfTaskSlots: 3 | ||
| # taskmanager.rpc.port: 7122 | ||
| # taskmanager.memory.process.size: 2g | ||
| # blob.server.port: 7124 | ||
| # rest.port: 8082 | ||
| # parallelism.default: 4 | ||
| # restart-strategy.type: fixed-delay | ||
| # restart-strategy.fixed-delay.attempts: 3 | ||
| targets: | ||
| target: | ||
| connection: | ||
| type: redis | ||
| host: ${HOST_IP} | ||
| port: 12000 | ||
| user: ${TARGET_DB_USERNAME} | ||
| password: ${TARGET_DB_PASSWORD} | ||
| processors: | ||
| target_data_type: hash | ||
| ``` | ||
|
|
||
| Make sure to replace the relevant connection details with your own for both the Spanner and target | ||
| Redis databases. | ||
|
|
||
| ## 6. Additional Kubernetes configuration | ||
|
|
||
| In your `rdi-values.yaml` file for Kubernetes deployment, make sure to configure the `dataPlane` | ||
| section like this: | ||
|
|
||
| ```yaml | ||
| operator: | ||
| dataPlane: | ||
| flinkCollector: | ||
| enabled: true | ||
| jobManager: | ||
| ingress: | ||
| enabled: true | ||
| className: traefik # Replace with your ingress controller | ||
| hosts: | ||
| - hostname # Replace with your desired ingress hostname | ||
| ``` | ||
|
|
||
| ## 7. Configuration is complete | ||
|
|
||
| Once you have followed the steps above, your Google Spanner database is ready for RDI to use. | ||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is done during installation IIRC as part of values.yaml
so needs to be added to k8s installation - configure to install when using Spanner as source
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yaronp68 This is specific to the spanner as we're configuring the flinkCollector. I think here is the correct place?
We could add it to the other doc as well...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ZdravkoDonev-redis how would the customer discover this when installing? Maybe add a If you plan to use Spanner with this installation of RDI please have a look at link
@dwdougherty FYI