Skip to content

Commit 9a35b15

Browse files
authored
Merge pull request #199845 from TheovanKraay/cassandra-api-spark-updates
add Spark 3 version warning
2 parents a5198db + 85f927b commit 9a35b15

8 files changed

+30
-6
lines changed

articles/cosmos-db/cassandra/migrate-data-databricks.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,9 @@ Select **Install**, and then restart the cluster when installation is complete.
5252
> [!NOTE]
5353
> Make sure that you restart the Databricks cluster after the Cassandra Connector library has been installed.
5454
55+
> [!WARNING]
56+
> The samples shown in this article have been tested with Spark **version 3.0.1** and the corresponding Cassandra Spark Connector **com.datastax.spark:spark-cassandra-connector-assembly_2.12:3.0.0**. Later versions of Spark and/or the Cassandra connector may not function as expected.
57+
5558
## Create Scala Notebook for migration
5659

5760
Create a Scala Notebook in Databricks. Replace your source and target Cassandra configurations with the corresponding credentials, and source and target keyspaces and tables. Then run the following code:

articles/cosmos-db/cassandra/spark-create-operations.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,10 @@ spark.conf.set("spark.cassandra.connection.keep_alive_ms", "600000000")
4747
```
4848

4949
> [!NOTE]
50-
> If you are using Spark 3.0 or higher, you do not need to install the Cosmos DB helper and connection factory. You should also use `remoteConnectionsPerExecutor` instead of `connections_per_executor_max` for the Spark 3 connector (see above). You will see that connection related properties are defined within the notebook above. Using the syntax below, connection properties can be defined in this manner without needing to be defined at the cluster level (Spark context initialization).
50+
> If you are using Spark 3.0, you do not need to install the Cosmos DB helper and connection factory. You should also use `remoteConnectionsPerExecutor` instead of `connections_per_executor_max` for the Spark 3 connector (see above). You will see that connection related properties are defined within the notebook above. Using the syntax below, connection properties can be defined in this manner without needing to be defined at the cluster level (Spark context initialization).
51+
52+
> [!WARNING]
53+
> The Spark 3 samples shown in this article have been tested with Spark **version 3.0.1** and the corresponding Cassandra Spark Connector **com.datastax.spark:spark-cassandra-connector-assembly_2.12:3.0.0**. Later versions of Spark and/or the Cassandra connector may not function as expected.
5154
5255
## Dataframe API
5356

articles/cosmos-db/cassandra/spark-databricks.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -49,7 +49,10 @@ This article details how to work with Azure Cosmos DB Cassandra API from Spark o
4949
* **Azure Cosmos DB Cassandra API-specific library:** - If you are using Spark 2.x, a custom connection factory is required to configure the retry policy from the Cassandra Spark connector to Azure Cosmos DB Cassandra API. Add the `com.microsoft.azure.cosmosdb:azure-cosmos-cassandra-spark-helper:1.2.0`[maven coordinates](https://search.maven.org/artifact/com.microsoft.azure.cosmosdb/azure-cosmos-cassandra-spark-helper/1.2.0/jar) to attach the library to the cluster.
5050

5151
> [!NOTE]
52-
> If you are using Spark 3.0 or higher, you do not need to install the Cosmos DB Cassandra API-specific library mentioned above.
52+
> If you are using Spark 3.0, you do not need to install the Cosmos DB Cassandra API-specific library mentioned above.
53+
54+
> [!WARNING]
55+
> The Spark 3 samples shown in this article have been tested with Spark **version 3.0.1** and the corresponding Cassandra Spark Connector **com.datastax.spark:spark-cassandra-connector-assembly_2.12:3.0.0**. Later versions of Spark and/or the Cassandra connector may not function as expected.
5356
5457
## Sample notebooks
5558

articles/cosmos-db/cassandra/spark-ddl-operations.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -51,7 +51,10 @@ spark.conf.set("spark.cassandra.connection.keep_alive_ms", "600000000")
5151
```
5252

5353
> [!NOTE]
54-
> If you are using Spark 3.0 or higher, you do not need to install the Cosmos DB helper and connection factory. You should also use `remoteConnectionsPerExecutor` instead of `connections_per_executor_max` for the Spark 3 connector (see above).
54+
> If you are using Spark 3.0, you do not need to install the Cosmos DB helper and connection factory. You should also use `remoteConnectionsPerExecutor` instead of `connections_per_executor_max` for the Spark 3 connector (see above).
55+
56+
> [!WARNING]
57+
> The Spark 3 samples shown in this article have been tested with Spark **version 3.0.1** and the corresponding Cassandra Spark Connector **com.datastax.spark:spark-cassandra-connector-assembly_2.12:3.0.0**. Later versions of Spark and/or the Cassandra connector may not function as expected.
5558
5659
## Keyspace DDL operations
5760

articles/cosmos-db/cassandra/spark-delete-operation.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,10 @@ spark.conf.set("spark.cassandra.connection.keep_alive_ms", "600000000")
4747
```
4848

4949
> [!NOTE]
50-
> If you are using Spark 3.0 or higher, you do not need to install the Cosmos DB helper and connection factory. You should also use `remoteConnectionsPerExecutor` instead of `connections_per_executor_max` for the Spark 3 connector (see above). You will see that connection related properties are defined within the notebook above. Using the syntax below, connection properties can be defined in this manner without needing to be defined at the cluster level (Spark context initialization). However, when using operations that require spark context (for example, `CassandraConnector(sc)` for `delete` as shown below), connection properties need to be defined at the cluster level.
50+
> If you are using Spark 3.0, you do not need to install the Cosmos DB helper and connection factory. You should also use `remoteConnectionsPerExecutor` instead of `connections_per_executor_max` for the Spark 3 connector (see above). You will see that connection related properties are defined within the notebook above. Using the syntax below, connection properties can be defined in this manner without needing to be defined at the cluster level (Spark context initialization). However, when using operations that require spark context (for example, `CassandraConnector(sc)` for `delete` as shown below), connection properties need to be defined at the cluster level.
51+
52+
> [!WARNING]
53+
> The Spark 3 samples shown in this article have been tested with Spark **version 3.0.1** and the corresponding Cassandra Spark Connector **com.datastax.spark:spark-cassandra-connector-assembly_2.12:3.0.0**. Later versions of Spark and/or the Cassandra connector may not function as expected.
5154
5255
## Sample data generator
5356
We will use this code fragment to generate sample data:

articles/cosmos-db/cassandra/spark-read-operation.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -48,7 +48,10 @@ spark.conf.set("spark.cassandra.connection.keep_alive_ms", "600000000")
4848
```
4949

5050
> [!NOTE]
51-
> If you are using Spark 3.0 or higher, you do not need to install the Cosmos DB helper and connection factory. You should also use `remoteConnectionsPerExecutor` instead of `connections_per_executor_max` for the Spark 3 connector(see above). You will see that connection related properties are defined within the notebook above. Using the syntax below, connection properties can be defined in this manner without needing to be defined at the cluster level (Spark context initialization).
51+
> If you are using Spark 3.0, you do not need to install the Cosmos DB helper and connection factory. You should also use `remoteConnectionsPerExecutor` instead of `connections_per_executor_max` for the Spark 3 connector(see above). You will see that connection related properties are defined within the notebook above. Using the syntax below, connection properties can be defined in this manner without needing to be defined at the cluster level (Spark context initialization).
52+
53+
> [!WARNING]
54+
> The Spark 3 samples shown in this article have been tested with Spark **version 3.0.1** and the corresponding Cassandra Spark Connector **com.datastax.spark:spark-cassandra-connector-assembly_2.12:3.0.0**. Later versions of Spark and/or the Cassandra connector may not function as expected.
5255
5356
## Dataframe API
5457

articles/cosmos-db/cassandra/spark-table-copy-operations.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,9 @@ spark.conf.set("spark.cassandra.connection.keep_alive_ms", "600000000")
4949
> [!NOTE]
5050
> If you are using Spark 3.0 or higher, you do not need to install the Cosmos DB helper and connection factory. You should also use `remoteConnectionsPerExecutor` instead of `connections_per_executor_max` for the Spark 3 connector (see above). You will see that connection related properties are defined within the notebook above. Using the syntax below, connection properties can be defined in this manner without needing to be defined at the cluster level (Spark context initialization).
5151
52+
> [!WARNING]
53+
> The Spark 3 samples shown in this article have been tested with Spark **version 3.0.1** and the corresponding Cassandra Spark Connector **com.datastax.spark:spark-cassandra-connector-assembly_2.12:3.0.0**. Later versions of Spark and/or the Cassandra connector may not function as expected.
54+
5255
## Insert sample data
5356
```scala
5457
val booksDF = Seq(

articles/cosmos-db/cassandra/spark-upsert-operations.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,10 @@ spark.conf.set("spark.cassandra.connection.keep_alive_ms", "600000000")
4747
```
4848

4949
> [!NOTE]
50-
> If you are using Spark 3.0 or higher, you do not need to install the Cosmos DB helper and connection factory. You should also use `remoteConnectionsPerExecutor` instead of `connections_per_executor_max` for the Spark 3 connector (see above). You will see that connection related properties are defined within the notebook above. Using the syntax below, connection properties can be defined in this manner without needing to be defined at the cluster level (Spark context initialization). However, when using operations that require spark context (for example, `CassandraConnector(sc)` for `update` as shown below), connection properties need to be defined at the cluster level.
50+
> If you are using Spark 3.0, you do not need to install the Cosmos DB helper and connection factory. You should also use `remoteConnectionsPerExecutor` instead of `connections_per_executor_max` for the Spark 3 connector (see above). You will see that connection related properties are defined within the notebook above. Using the syntax below, connection properties can be defined in this manner without needing to be defined at the cluster level (Spark context initialization). However, when using operations that require spark context (for example, `CassandraConnector(sc)` for `update` as shown below), connection properties need to be defined at the cluster level.
51+
52+
> [!WARNING]
53+
> The Spark 3 samples shown in this article have been tested with Spark **version 3.0.1** and the corresponding Cassandra Spark Connector **com.datastax.spark:spark-cassandra-connector-assembly_2.12:3.0.0**. Later versions of Spark and/or the Cassandra connector may not function as expected.
5154
5255
## Dataframe API
5356

0 commit comments

Comments
 (0)