GoogleCloudPlatform
diff --git a/‎v2/sourcedb-to-spanner/README_Sourcedb_to_Spanner.md‎
Lines changed: 121 additions & 0 deletions b/‎v2/sourcedb-to-spanner/README_Sourcedb_to_Spanner.md‎
Lines changed: 121 additions & 0 deletions
diff --git a/‎v2/sourcedb-to-spanner/README_Sourcedb_to_Spanner_Flex.md‎
Lines changed: 3 additions & 3 deletions b/‎v2/sourcedb-to-spanner/README_Sourcedb_to_Spanner_Flex.md‎
Lines changed: 3 additions & 3 deletions
diff --git a/‎v2/sourcedb-to-spanner/pom.xml‎
Lines changed: 40 additions & 16 deletions b/‎v2/sourcedb-to-spanner/pom.xml‎
Lines changed: 40 additions & 16 deletions
diff --git a/‎v2/sourcedb-to-spanner/src/main/java/com/google/cloud/teleport/v2/options/SourceDbToSpannerOptions.java‎
Lines changed: 47 additions & 2 deletions b/‎v2/sourcedb-to-spanner/src/main/java/com/google/cloud/teleport/v2/options/SourceDbToSpannerOptions.java‎
Lines changed: 47 additions & 2 deletions
diff --git a/‎v2/sourcedb-to-spanner/src/main/java/com/google/cloud/teleport/v2/source/reader/auth/dbauth/GuardedStringValueProvider.java‎
Lines changed: 13 additions & 0 deletions b/‎v2/sourcedb-to-spanner/src/main/java/com/google/cloud/teleport/v2/source/reader/auth/dbauth/GuardedStringValueProvider.java‎
Lines changed: 13 additions & 0 deletions
diff --git a/‎v2/sourcedb-to-spanner/src/main/java/com/google/cloud/teleport/v2/source/reader/io/cassandra/exception/AstraDBNotFoundException.java‎
Lines changed: 25 additions & 0 deletions b/‎v2/sourcedb-to-spanner/src/main/java/com/google/cloud/teleport/v2/source/reader/io/cassandra/exception/AstraDBNotFoundException.java‎
Lines changed: 25 additions & 0 deletions
@@ -339,6 +339,127 @@ In case your job fails due to many exceptions like the above, here are a few ste
 #### Throughput on Spanner raises and falls in sharp bursts
 It's possible that the default configuration could lead to spanner throughput raise and fall in sharp bursts. In case this is observed, you can disable spanner batch writes by setting `batchSizeForSpannerMutations` as 0.
 
+## AstraDB to Spanner Bulk Migration
+### Prerequisites
+For bulk data migration from AstraDB to spanner, here are a few prerequisites you will need:
+
+#### Prerequisite-1: Network Connectivity
+1. Choose a VPC in the project where you would like to run the dataflow job (default is the VPC named `default` in the project).
+2. Ensure that the VPC has network connectivity to your AstraDB instance.
+#### Prerequisite-2: AstraDB credentials and related details
+You will need the following Astra DB details:
+1. AstraDB token.
+   1. The AstraDB token can be generated from the database page.
+   2. Please ensure that the token remains valid till the duration of the migration. Depending on the size of the database, the migration can take a few hours.
+2. AstraDB Database ID
+3. AstraDB Region - Leave it empty for default region.
+4. AstraDB Keyspace - The keyspace you want to migrate to spanner.
+Note that the template will automatically download the security bundle from the database.
+
+#### Prerequisite-3: Active Astra DB database
+Please ensure that the AstraDB instance is active (not hibernated) through the migration.
+#### Prerequisite-4: Spanner
+You will need to provision a spanner database where you would like to migrate the data. The database would need to have tables with a schema that maps to the schema on the source.
+The tables which are present both on Spanner and Cassandra would be the ones that are migrated.
+#### Prerequisite-5: GCS
+You would need a GCS bucket to stage your build, driver configuration file, and provide an output directory for DLQs.
+### Run Migration
+
+**Using the staged template**:
+
+Follow [above](#staging-the-template) to build the template and stage it in GCS.
+This step prints the path of the staged template which is passed as `TEMPLATE_SPEC_GCSPATH` below.
+
+To start a job with the staged template at any time using `gcloud`, you are going to
+need valid resources for the required parameters.
+
+Provided that, the following command line can be used:
+
+
+```shell
+### Basic Job Paramters
+export PROJECT=<your-project>
+export BUCKET_NAME=<bucket-name>
+export REGION=<GCP-Region-where-the-dataflow-machines-will-be-provisioned-like-us-central1>
+export TEMPLATE_SPEC_GCSPATH="gs://$BUCKET_NAME/templates/flex/Sourcedb_to_Spanner_Flex"
+### The number of works controls the fanout of Dataflow job to read from Cassandra.
+### While you might need to finetune this for best performance, a number close to number of nodes on Cassandra Cluster might be good place to start.
+export MAX_WORKERS="<MAX_NUMBER_OF_DATAFLOW_WORKERS_TO_READ_FROM_CASSANDRA>"
+export NUM_WORKERS="<INITIAL_NUMBER_OF_DATAFLOW_WORKERS_TO_READ_FROM_CASSANDRA>"
+### The type of machine. `e2-standard-32` might be good starting point for most use cases.
+eport  MACHINE_TYPE="<WORKER_MACHINE_TYPE>"
+
+### Required
+export INSTANCE_ID=<spanner instanceId>
+export DATABASE_ID=<spanner databaseId>
+export PROJECT_ID=<spanner projectId>
+## Either the token directly (starting with `AstraCS`), or URL to gcp secret store.
+ASTRA_DB_APPLICATION_TOKEN="AstraCS:<Your-Astra-DB-Token>"
+## Astra DB database ID.
+ASTRA_DB_ID="<Your-Astra-DB-ID>"
+ASTRA_DB_KEYSPACE="<Your-Astra-DB-Key-Space>"
+## Astra DB region. Leave empty for default region.
+ASTRA_DB_REGION="<Your-Astra-DB-Region>"
+#### Stores DLQ.
+export OUTPUT_DIRECTORY=<outputDirectory>
+
+### Optional
+#### Use A session file in case you would like the Cassandra and Spanner Tables to have different names.
+export SESSION_FILE_PATH=""
+export DISABLED_ALGORITHMS=<disabledAlgorithms>
+export EXTRA_FILES_TO_STAGE=<extraFilesToStage>
+export DEFAULT_LOG_LEVEL=INFO
+#### Set insert only mode to true, in case you would run bulk migration in parallel to dual writes.
+#### This mode stops the bulk template from overwriting rows that already exist in spanner.
+#### If you are not replicating live changes to spanner in parallel, you could choose to set this mode to false.
+#### Setting this mode to false causes the bulk template to overwrite existing rows in spanner.
+#### false is the default if unset.
+export INSERT_ONLY_MODE_FOR_SPANNER_MUTATIONS="true"
+#### Region for Dataflow workers (Required ony if you want to configure network and subnetwork.
+expoert WORKER_REGION="${REGION}"
+#### Network where you would like to run Dataflow. Defaults to default. This VPC must have access to Cassandra nodes you would like to migrate from.
+export NETWORK="<VPC_NAME>"
+#### Subnet where you would like to run Dataflow. Defaults to default. This subnet must have access to Cassandra nodes you would like to migrate from.
+export SUBNETWORK="regions/${WORKER_REGION}/subnetworks/<SUBNET_NAME>"
+#### Number of partitions for parallel read.
+##### By default Apache Beam's CassandraIO sets NUM_PARTITIONS equals to number
+##### of nodes on the Cassandra Cluster. This default does not give good performance
+##### larger workloads as it limits the parallelization.
+##### While specifcs would depend on many factors like number of Cassandra nodes, distribution
+##### of partitions of the table across the nodes,
+##### In general a partition of average size of 150 MB gives good throughput and might be a good place to start the fine-tuning.
+NUM_PARTITIONS="<NUM_PARTITIONS>"
+#### Disable Spanner Batch Writes.
+BATCH_SIZE_FOR_SPANNER_MUTATIONS=1
+
+gcloud dataflow flex-template run "sourcedb-to-spanner-flex-job" \
+  --project "$PROJECT" \
+  --region "$REGION" \
+  --network "$NETWORK" \
+  --max-workers "$MAX_WORKERS" \
+  --num-workers "$NUM_WORKERS" \
+  --worker-machine-type "$MACHINE_TYPE" \
+  --subnetwork "$SUBNETWORK" \
+  --template-file-gcs-location "$TEMPLATE_SPEC_GCSPATH" \
+  --additional-experiments="[\"disable_runner_v2\"]" \
+  --parameters "sourceDbDialect=ASTRA_DB" \
+  --parameters "insertOnlyModeForSpannerMutations=$INSERT_ONLY_MODE_FOR_SPANNER_MUTATIONS" \
+  --parameters "astraDBToken=${ASTRA_DB_APPLICATION_TOKEN}" \
+  --parameters "astraDBRegion=${ASTRA_DB_REGION}" \
+  --parameters "astraDBDatabaseId=${ASTRA_DB_ID}" \
+  --parameters "astraDBKeySpace=${ASTRA_DB_KEYSPACE}" \
+  --parameters "instanceId=$INSTANCE_ID" \
+  --parameters "databaseId=$DATABASE_ID" \
+  --parameters "projectId=$PROJECT_ID" \
+  --parameters "sessionFilePath=$SESSION_FILE_PATH" \
+  --parameters "outputDirectory=$OUTPUT_DIRECTORY" \
+  --parameters "disabledAlgorithms=$DISABLED_ALGORITHMS" \
+  --parameters "extraFilesToStage=$EXTRA_FILES_TO_STAGE" \
+  --parameters "defaultLogLevel=$DEFAULT_LOG_LEVEL" \
+  --parameters "numPartitions=${NUM_PARTITIONS}" \
+  --parameters "batchSizeForSpannerMutations=${BATCH_SIZE_FOR_SPANNER_MUTATIONS}"
+```
+
 
 ## Terraform
 
 
@@ -28,7 +28,7 @@ on [Metadata Annotations](https://github.com/GoogleCloudPlatform/DataflowTemplat
 
 ### Optional parameters
 
-* **sourceDbDialect**: Possible values are `CASSANDRA`, `MYSQL` and `POSTGRESQL`. Defaults to: MYSQL.
+* **sourceDbDialect**: Possible values are `ASTRA_DB`, `CASSANDRA`, `MYSQL` and `POSTGRESQL`. Defaults to: MYSQL.
 * **jdbcDriverJars**: The comma-separated list of driver JAR files. For example, `gs://your-bucket/driver_jar1.jar,gs://your-bucket/driver_jar2.jar`. Defaults to empty.
 * **jdbcDriverClassName**: The JDBC driver class name. For example, `com.mysql.jdbc.Driver`. Defaults to: com.mysql.jdbc.Driver.
 * **username**: The username to be used for the JDBC connection. Defaults to empty.
@@ -46,8 +46,8 @@ on [Metadata Annotations](https://github.com/GoogleCloudPlatform/DataflowTemplat
 * **insertOnlyModeForSpannerMutations**: By default the pipeline uses Upserts to write rows to spanner. Which means existing rows would get overwritten. If InsertOnly mode is enabled, inserts would be used instead of upserts and existing rows won't be overwritten.
 * **batchSizeForSpannerMutations**: BatchSize in bytes for Spanner Mutations. if set less than 0, default of Apache Beam's SpannerIO is used, which is 1MB. Set this to 0 or 10, to disable batching mutations.
 * **spannerPriority**: The request priority for Cloud Spanner calls. The value must be one of: [`HIGH`,`MEDIUM`,`LOW`]. Defaults to `MEDIUM`.
-* **tableOverrides**: These are the table name overrides from source to spanner. They are written in thefollowing format: [{SourceTableName1, SpannerTableName1}, {SourceTableName2, SpannerTableName2}]This example shows mapping Singers table to Vocalists and Albums table to Records. For example, `[{Singers, Vocalists}, {Albums, Records}]`. Defaults to empty.
-* **columnOverrides**: These are the column name overrides from source to spanner. They are written in thefollowing format: [{SourceTableName1.SourceColumnName1, SourceTableName1.SpannerColumnName1}, {SourceTableName2.SourceColumnName1, SourceTableName2.SpannerColumnName1}]Note that the SourceTableName should remain the same in both the source and spanner pair. To override table names, use tableOverrides.The example shows mapping SingerName to TalentName and AlbumName to RecordName in Singers and Albums table respectively. For example, `[{Singers.SingerName, Singers.TalentName}, {Albums.AlbumName, Albums.RecordName}]`. Defaults to empty.
+* **tableOverrides**: These are the table name overrides from source to spanner. They are written in the following format: [{SourceTableName1, SpannerTableName1}, {SourceTableName2, SpannerTableName2}]This example shows mapping Singers table to Vocalists and Albums table to Records. For example, `[{Singers, Vocalists}, {Albums, Records}]`. Defaults to empty.
+* **columnOverrides**: These are the column name overrides from source to spanner. They are written in the following format: [{SourceTableName1.SourceColumnName1, SourceTableName1.SpannerColumnName1}, {SourceTableName2.SourceColumnName1, SourceTableName2.SpannerColumnName1}]Note that the SourceTableName should remain the same in both the source and spanner pair. To override table names, use tableOverrides.The example shows mapping SingerName to TalentName and AlbumName to RecordName in Singers and Albums table respectively. For example, `[{Singers.SingerName, Singers.TalentName}, {Albums.AlbumName, Albums.RecordName}]`. Defaults to empty.
 * **schemaOverridesFilePath**: A file which specifies the table and the column name overrides from source to spanner. Defaults to empty.
 * **uniformizationStageCountHint**: Hint for number of uniformization stages. Currently Applicable only for jdbc based sources like MySQL or PostgreSQL. Leave 0 or default to disable uniformization. Set to -1 for a log(numPartition) number of stages. If your source primary key space is uniformly distributed (for example an auto-incrementing key with sparse holes), it's based to leave it disabled. If your keyspace is not uniform, you might encounter a laggard VM in your dataflow run. In such a case, you can set it to -1 to enable uniformization. Manually setting it to values other than 0 or -1 would help you fine tune the tradeoff of the overhead added by uniformization stages and the  performance improvement due to better distribution of work.
 * **disabledAlgorithms**: Comma separated algorithms to disable. If this value is set to `none`, no algorithm is disabled. Use this parameter with caution, because the algorithms disabled by default might have vulnerabilities or performance issues. For example, `SSLv3, RC4`.
 
@@ -27,6 +27,10 @@
   </parent>
 
   <artifactId>sourcedb-to-spanner</artifactId>
+  <properties>
+    <cassandra-java-driver-core.version>4.18.1
+    </cassandra-java-driver-core.version>
+  </properties>
 
   <dependencies>
     <dependency>
@@ -74,18 +78,11 @@
       <artifactId>postgresql</artifactId>
       <version>${postgresql.version}</version>
     </dependency>
-
-    <!-- https://mvnrepository.com/artifact/com.datastax.oss/java-driver-core -->
+    <!-- https://mvnrepository.com/artifact/org.apache.cassandra/java-driver-core -->
     <dependency>
-      <groupId>com.datastax.oss</groupId>
+      <groupId>org.apache.cassandra</groupId>
       <artifactId>java-driver-core</artifactId>
-      <version>4.17.0</version>
-      <exclusions>
-        <exclusion>
-          <groupId>org.slf4j</groupId>
-          <artifactId>slf4j-api</artifactId>
-        </exclusion>
-      </exclusions>
+      <version>${cassandra-java-driver-core.version}</version>
     </dependency>
 
     <!-- Needed for Beam CassandraIO -->
@@ -119,6 +116,12 @@
       <groupId>org.apache.beam</groupId>
       <artifactId>beam-it-cassandra</artifactId>
       <scope>test</scope>
+      <exclusions>
+        <exclusion>
+          <groupId>com.datastax.oss</groupId>
+          <artifactId>java-driver-core</artifactId>
+        </exclusion>
+      </exclusions>
     </dependency>
 
     <!-- https://mvnrepository.com/artifact/org.apache.derby/derby -->
@@ -166,12 +169,12 @@
       <version>1.0-SNAPSHOT</version>
       <scope>compile</scope>
     </dependency>
-      <dependency>
-        <groupId>com.github.nosan</groupId>
-        <artifactId>embedded-cassandra</artifactId>
-        <version>5.0.0</version>
-        <scope>test</scope>
-      </dependency>
+    <dependency>
+      <groupId>com.github.nosan</groupId>
+      <artifactId>embedded-cassandra</artifactId>
+      <version>5.0.0</version>
+      <scope>test</scope>
+    </dependency>
     <dependency>
       <groupId>org.apache.beam</groupId>
       <artifactId>beam-sdks-java-io-cassandra</artifactId>
@@ -210,6 +213,17 @@
       <version>1.19.0</version>
       <scope>test</scope>
     </dependency>
+    <dependency>
+      <groupId>com.datastax.astra</groupId>
+      <artifactId>beam-sdks-java-io-astra</artifactId>
+      <version>4.18.1</version>
+    </dependency>
+    <!-- Downloading Secure Bundle -->
+    <dependency>
+      <groupId>com.datastax.astra</groupId>
+      <artifactId>astra-sdk-devops</artifactId>
+      <version>0.6.3</version>
+    </dependency>
 
     <!-- test dependencies for localCassandraIO end -->
 
@@ -226,4 +240,14 @@
       <scope>test</scope>
     </dependency>
   </dependencies>
+  <dependencyManagement>
+    <dependencies>
+      <dependency>
+        <groupId>org.apache.cassandra</groupId>
+        <artifactId>java-driver-core</artifactId>
+        <version>${cassandra-java-driver-core.version}</version>
+      </dependency>
+    </dependencies>
+  </dependencyManagement>
+
 </project>
@@ -22,13 +22,15 @@
 /** Interface used by the SourcedbToSpanner pipeline to accept user input. */
 public interface SourceDbToSpannerOptions extends CommonTemplateOptions {
   String CASSANDRA_SOURCE_DIALECT = "CASSANDRA";
+  String ASTRA_DB_SOURCE_DIALECT = "ASTRA_DB";
   String MYSQL_SOURCE_DIALECT = "MYSQL";
   String PG_SOURCE_DIALECT = "POSTGRESQL";
 
   @TemplateParameter.Enum(
       order = 1,
       optional = true,
       enumOptions = {
+        @TemplateParameter.TemplateEnumOption(ASTRA_DB_SOURCE_DIALECT),
         @TemplateParameter.TemplateEnumOption(CASSANDRA_SOURCE_DIALECT),
         @TemplateParameter.TemplateEnumOption(MYSQL_SOURCE_DIALECT),
         @TemplateParameter.TemplateEnumOption(PG_SOURCE_DIALECT)
@@ -66,14 +68,16 @@ public interface SourceDbToSpannerOptions extends CommonTemplateOptions {
 
   @TemplateParameter.Text(
       order = 4,
-      regexes = {"(^jdbc:mysql://.*|^jdbc:postgresql://.*|^gs://.*)"},
+      optional = true,
+      regexes = {"(^jdbc:mysql://.*|^jdbc:postgresql://.*|^gs://.*|^$)"},
       groupName = "Source",
       description =
           "URL to connect to the source database host. It can be either of "
               + "1. The JDBC connection URL - which must contain the host, port and source db name and can optionally contain properties like autoReconnect, maxReconnects etc. Format: `jdbc:{mysql|postgresql}://{host}:{port}/{dbName}?{parameters}`"
               + "2. The shard config path",
       helpText =
-          "The JDBC connection URL string. For example, `jdbc:mysql://127.4.5.30:3306/my-db?autoReconnect=true&maxReconnects=10&unicode=true&characterEncoding=UTF-8` or the shard config")
+          "The JDBC connection URL string. For example, `jdbc:mysql://127.4.5.30:3306/my-db?autoReconnect=true&maxReconnects=10&unicode=true&characterEncoding=UTF-8` or the shard config. This parameter is required except for ASTRA_DB source.")
+  @Default.String("")
   String getSourceConfigURL();
 
   void setSourceConfigURL(String url);
@@ -355,4 +359,45 @@ public interface SourceDbToSpannerOptions extends CommonTemplateOptions {
   Long getUniformizationStageCountHint();
 
   void setUniformizationStageCountHint(Long value);
+
+  @TemplateParameter.Text(
+      order = 28,
+      optional = true,
+      description = "Astra DB token",
+      helpText =
+          "AstraDB token, ignored for non-AstraDB dialects. This token is used to automatically download the securebundle by the tempalte.")
+  @Default.String("")
+  String getAstraDBToken();
+
+  void setAstraDBToken(String value);
+
+  @TemplateParameter.Text(
+      order = 29,
+      optional = true,
+      description = "Astra DB databaseID",
+      helpText = "AstraDB databaseID, ignored for non-AstraDB dialects")
+  @Default.String("")
+  String getAstraDBDatabaseId();
+
+  void setAstraDBDatabaseId(String value);
+
+  @TemplateParameter.Text(
+      order = 30,
+      optional = true,
+      description = "Astra DB keySpace",
+      helpText = "AstraDB keySpace, ignored for non-AstraDB dialects")
+  @Default.String("")
+  String getAstraDBKeySpace();
+
+  void setAstraDBKeySpace(String value);
+
+  @TemplateParameter.Text(
+      order = 31,
+      optional = true,
+      description = "Astra DB Region",
+      helpText = "AstraDB region, ignored for non-AstraDB dialects")
+  @Default.String("")
+  String getAstraDBRegion();
+
+  void setAstraDBRegion(String value);
 }
@@ -42,6 +42,19 @@ public static GuardedStringValueProvider create(String value) {
     return new GuardedStringValueProvider(new GuardedString(value.toCharArray()));
   }
 
+  @Override
+  public boolean equals(Object other) {
+    boolean result;
+    if ((other == null) || (getClass() != other.getClass())) {
+      result = false;
+    } // end if
+    else {
+      GuardedStringValueProvider otherGuardedString = (GuardedStringValueProvider) other;
+      result = this.get().equals(otherGuardedString.get());
+    }
+    return result;
+  }
+
   /**
    * Implementation {@link ValueProvider#get()}.
    *
 
@@ -0,0 +1,25 @@
+/*
+ * Copyright (C) 2025 Google LLC
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License"); you may not
+ * use this file except in compliance with the License. You may obtain a copy of
+ * the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+ * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+ * License for the specific language governing permissions and limitations under
+ * the License.
+ */
+package com.google.cloud.teleport.v2.source.reader.io.cassandra.exception;
+
+import com.google.cloud.teleport.v2.source.reader.io.exception.SchemaDiscoveryException;
+
+public class AstraDBNotFoundException extends SchemaDiscoveryException {
+
+  public AstraDBNotFoundException(String msg) {
+    super(new Throwable(msg));
+  }
+}