You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: v2/sourcedb-to-spanner/README_Sourcedb_to_Spanner.md
+121Lines changed: 121 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -339,6 +339,127 @@ In case your job fails due to many exceptions like the above, here are a few ste
339
339
#### Throughput on Spanner raises and falls in sharp bursts
340
340
It's possible that the default configuration could lead to spanner throughput raise and fall in sharp bursts. In case this is observed, you can disable spanner batch writes by setting `batchSizeForSpannerMutations` as 0.
341
341
342
+
## AstraDB to Spanner Bulk Migration
343
+
### Prerequisites
344
+
For bulk data migration from AstraDB to spanner, here are a few prerequisites you will need:
345
+
346
+
#### Prerequisite-1: Network Connectivity
347
+
1. Choose a VPC in the project where you would like to run the dataflow job (default is the VPC named `default` in the project).
348
+
2. Ensure that the VPC has network connectivity to your AstraDB instance.
349
+
#### Prerequisite-2: AstraDB credentials and related details
350
+
You will need the following Astra DB details:
351
+
1. AstraDB token.
352
+
1. The AstraDB token can be generated from the database page.
353
+
2. Please ensure that the token remains valid till the duration of the migration. Depending on the size of the database, the migration can take a few hours.
354
+
2. AstraDB Database ID
355
+
3. AstraDB Region - Leave it empty for default region.
356
+
4. AstraDB Keyspace - The keyspace you want to migrate to spanner.
357
+
Note that the template will automatically download the security bundle from the database.
358
+
359
+
#### Prerequisite-3: Active Astra DB database
360
+
Please ensure that the AstraDB instance is active (not hibernated) through the migration.
361
+
#### Prerequisite-4: Spanner
362
+
You will need to provision a spanner database where you would like to migrate the data. The database would need to have tables with a schema that maps to the schema on the source.
363
+
The tables which are present both on Spanner and Cassandra would be the ones that are migrated.
364
+
#### Prerequisite-5: GCS
365
+
You would need a GCS bucket to stage your build, driver configuration file, and provide an output directory for DLQs.
366
+
### Run Migration
367
+
368
+
**Using the staged template**:
369
+
370
+
Follow [above](#staging-the-template) to build the template and stage it in GCS.
371
+
This step prints the path of the staged template which is passed as `TEMPLATE_SPEC_GCSPATH` below.
372
+
373
+
To start a job with the staged template at any time using `gcloud`, you are going to
374
+
need valid resources for the required parameters.
375
+
376
+
Provided that, the following command line can be used:
Copy file name to clipboardExpand all lines: v2/sourcedb-to-spanner/README_Sourcedb_to_Spanner_Flex.md
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -28,7 +28,7 @@ on [Metadata Annotations](https://github.com/GoogleCloudPlatform/DataflowTemplat
28
28
29
29
### Optional parameters
30
30
31
-
***sourceDbDialect**: Possible values are `CASSANDRA`, `MYSQL` and `POSTGRESQL`. Defaults to: MYSQL.
31
+
***sourceDbDialect**: Possible values are `ASTRA_DB`, `CASSANDRA`, `MYSQL` and `POSTGRESQL`. Defaults to: MYSQL.
32
32
***jdbcDriverJars**: The comma-separated list of driver JAR files. For example, `gs://your-bucket/driver_jar1.jar,gs://your-bucket/driver_jar2.jar`. Defaults to empty.
33
33
***jdbcDriverClassName**: The JDBC driver class name. For example, `com.mysql.jdbc.Driver`. Defaults to: com.mysql.jdbc.Driver.
34
34
***username**: The username to be used for the JDBC connection. Defaults to empty.
@@ -46,8 +46,8 @@ on [Metadata Annotations](https://github.com/GoogleCloudPlatform/DataflowTemplat
46
46
***insertOnlyModeForSpannerMutations**: By default the pipeline uses Upserts to write rows to spanner. Which means existing rows would get overwritten. If InsertOnly mode is enabled, inserts would be used instead of upserts and existing rows won't be overwritten.
47
47
***batchSizeForSpannerMutations**: BatchSize in bytes for Spanner Mutations. if set less than 0, default of Apache Beam's SpannerIO is used, which is 1MB. Set this to 0 or 10, to disable batching mutations.
48
48
***spannerPriority**: The request priority for Cloud Spanner calls. The value must be one of: [`HIGH`,`MEDIUM`,`LOW`]. Defaults to `MEDIUM`.
49
-
***tableOverrides**: These are the table name overrides from source to spanner. They are written in thefollowing format: [{SourceTableName1, SpannerTableName1}, {SourceTableName2, SpannerTableName2}]This example shows mapping Singers table to Vocalists and Albums table to Records. For example, `[{Singers, Vocalists}, {Albums, Records}]`. Defaults to empty.
50
-
***columnOverrides**: These are the column name overrides from source to spanner. They are written in thefollowing format: [{SourceTableName1.SourceColumnName1, SourceTableName1.SpannerColumnName1}, {SourceTableName2.SourceColumnName1, SourceTableName2.SpannerColumnName1}]Note that the SourceTableName should remain the same in both the source and spanner pair. To override table names, use tableOverrides.The example shows mapping SingerName to TalentName and AlbumName to RecordName in Singers and Albums table respectively. For example, `[{Singers.SingerName, Singers.TalentName}, {Albums.AlbumName, Albums.RecordName}]`. Defaults to empty.
49
+
***tableOverrides**: These are the table name overrides from source to spanner. They are written in the following format: [{SourceTableName1, SpannerTableName1}, {SourceTableName2, SpannerTableName2}]This example shows mapping Singers table to Vocalists and Albums table to Records. For example, `[{Singers, Vocalists}, {Albums, Records}]`. Defaults to empty.
50
+
***columnOverrides**: These are the column name overrides from source to spanner. They are written in the following format: [{SourceTableName1.SourceColumnName1, SourceTableName1.SpannerColumnName1}, {SourceTableName2.SourceColumnName1, SourceTableName2.SpannerColumnName1}]Note that the SourceTableName should remain the same in both the source and spanner pair. To override table names, use tableOverrides.The example shows mapping SingerName to TalentName and AlbumName to RecordName in Singers and Albums table respectively. For example, `[{Singers.SingerName, Singers.TalentName}, {Albums.AlbumName, Albums.RecordName}]`. Defaults to empty.
51
51
***schemaOverridesFilePath**: A file which specifies the table and the column name overrides from source to spanner. Defaults to empty.
52
52
***uniformizationStageCountHint**: Hint for number of uniformization stages. Currently Applicable only for jdbc based sources like MySQL or PostgreSQL. Leave 0 or default to disable uniformization. Set to -1 for a log(numPartition) number of stages. If your source primary key space is uniformly distributed (for example an auto-incrementing key with sparse holes), it's based to leave it disabled. If your keyspace is not uniform, you might encounter a laggard VM in your dataflow run. In such a case, you can set it to -1 to enable uniformization. Manually setting it to values other than 0 or -1 would help you fine tune the tradeoff of the overhead added by uniformization stages and the performance improvement due to better distribution of work.
53
53
***disabledAlgorithms**: Comma separated algorithms to disable. If this value is set to `none`, no algorithm is disabled. Use this parameter with caution, because the algorithms disabled by default might have vulnerabilities or performance issues. For example, `SSLv3, RC4`.
"URL to connect to the source database host. It can be either of "
73
76
+ "1. The JDBC connection URL - which must contain the host, port and source db name and can optionally contain properties like autoReconnect, maxReconnects etc. Format: `jdbc:{mysql|postgresql}://{host}:{port}/{dbName}?{parameters}`"
74
77
+ "2. The shard config path",
75
78
helpText =
76
-
"The JDBC connection URL string. For example, `jdbc:mysql://127.4.5.30:3306/my-db?autoReconnect=true&maxReconnects=10&unicode=true&characterEncoding=UTF-8` or the shard config")
79
+
"The JDBC connection URL string. For example, `jdbc:mysql://127.4.5.30:3306/my-db?autoReconnect=true&maxReconnects=10&unicode=true&characterEncoding=UTF-8` or the shard config. This parameter is required except for ASTRA_DB source.")
80
+
@Default.String("")
77
81
StringgetSourceConfigURL();
78
82
79
83
voidsetSourceConfigURL(Stringurl);
@@ -355,4 +359,45 @@ public interface SourceDbToSpannerOptions extends CommonTemplateOptions {
355
359
LonggetUniformizationStageCountHint();
356
360
357
361
voidsetUniformizationStageCountHint(Longvalue);
362
+
363
+
@TemplateParameter.Text(
364
+
order = 28,
365
+
optional = true,
366
+
description = "Astra DB token",
367
+
helpText =
368
+
"AstraDB token, ignored for non-AstraDB dialects. This token is used to automatically download the securebundle by the tempalte.")
369
+
@Default.String("")
370
+
StringgetAstraDBToken();
371
+
372
+
voidsetAstraDBToken(Stringvalue);
373
+
374
+
@TemplateParameter.Text(
375
+
order = 29,
376
+
optional = true,
377
+
description = "Astra DB databaseID",
378
+
helpText = "AstraDB databaseID, ignored for non-AstraDB dialects")
379
+
@Default.String("")
380
+
StringgetAstraDBDatabaseId();
381
+
382
+
voidsetAstraDBDatabaseId(Stringvalue);
383
+
384
+
@TemplateParameter.Text(
385
+
order = 30,
386
+
optional = true,
387
+
description = "Astra DB keySpace",
388
+
helpText = "AstraDB keySpace, ignored for non-AstraDB dialects")
389
+
@Default.String("")
390
+
StringgetAstraDBKeySpace();
391
+
392
+
voidsetAstraDBKeySpace(Stringvalue);
393
+
394
+
@TemplateParameter.Text(
395
+
order = 31,
396
+
optional = true,
397
+
description = "Astra DB Region",
398
+
helpText = "AstraDB region, ignored for non-AstraDB dialects")
Copy file name to clipboardExpand all lines: v2/sourcedb-to-spanner/src/main/java/com/google/cloud/teleport/v2/source/reader/auth/dbauth/GuardedStringValueProvider.java
+13Lines changed: 13 additions & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -42,6 +42,19 @@ public static GuardedStringValueProvider create(String value) {
0 commit comments