Merge pull request #199845 from TheovanKraay/cassandra-api-spark-updates

PRMerger18 · web-flow · commit 9a35b1512d5c · 2022-05-30T10:17:28.000-07:00
add Spark 3 version warning
diff --git a/articles/cosmos-db/cassandra/migrate-data-databricks.md b/articles/cosmos-db/cassandra/migrate-data-databricks.md
@@ -52,6 +52,9 @@ Select **Install**, and then restart the cluster when installation is complete.
 > [!NOTE]
 > Make sure that you restart the Databricks cluster after the Cassandra Connector library has been installed.
 
+> [!WARNING]
+> The samples shown in this article have been tested with Spark **version 3.0.1** and the corresponding Cassandra Spark Connector **com.datastax.spark:spark-cassandra-connector-assembly_2.12:3.0.0**. Later versions of Spark and/or the Cassandra connector may not function as expected.
+
 ## Create Scala Notebook for migration
 
 Create a Scala Notebook in Databricks. Replace your source and target Cassandra configurations with the corresponding credentials, and source and target keyspaces and tables. Then run the following code:
diff --git a/articles/cosmos-db/cassandra/spark-create-operations.md b/articles/cosmos-db/cassandra/spark-create-operations.md
@@ -47,7 +47,10 @@ spark.conf.set("spark.cassandra.connection.keep_alive_ms", "600000000")
 ```
 
 > [!NOTE]
-> If you are using Spark 3.0 or higher, you do not need to install the Cosmos DB helper and connection factory. You should also use `remoteConnectionsPerExecutor` instead of `connections_per_executor_max` for the Spark 3 connector (see above). You will see that connection related properties are defined within the notebook above. Using the syntax below, connection properties can be defined in this manner without needing to be defined at the cluster level (Spark context initialization).
+> If you are using Spark 3.0, you do not need to install the Cosmos DB helper and connection factory. You should also use `remoteConnectionsPerExecutor` instead of `connections_per_executor_max` for the Spark 3 connector (see above). You will see that connection related properties are defined within the notebook above. Using the syntax below, connection properties can be defined in this manner without needing to be defined at the cluster level (Spark context initialization).
+
+> [!WARNING]
+> The Spark 3 samples shown in this article have been tested with Spark **version 3.0.1** and the corresponding Cassandra Spark Connector **com.datastax.spark:spark-cassandra-connector-assembly_2.12:3.0.0**. Later versions of Spark and/or the Cassandra connector may not function as expected.
 
 ## Dataframe API
 
diff --git a/articles/cosmos-db/cassandra/spark-databricks.md b/articles/cosmos-db/cassandra/spark-databricks.md
@@ -49,7 +49,10 @@ This article details how to work with Azure Cosmos DB Cassandra API from Spark o
 * **Azure Cosmos DB Cassandra API-specific library:** - If you are using Spark 2.x, a custom connection factory is required to configure the retry policy from the Cassandra Spark connector to Azure Cosmos DB Cassandra API. Add the `com.microsoft.azure.cosmosdb:azure-cosmos-cassandra-spark-helper:1.2.0`[maven coordinates](https://search.maven.org/artifact/com.microsoft.azure.cosmosdb/azure-cosmos-cassandra-spark-helper/1.2.0/jar) to attach the library to the cluster.
 
 > [!NOTE]
-> If you are using Spark 3.0 or higher, you do not need to install the Cosmos DB Cassandra API-specific library mentioned above.
+> If you are using Spark 3.0, you do not need to install the Cosmos DB Cassandra API-specific library mentioned above.
+
+> [!WARNING]
+> The Spark 3 samples shown in this article have been tested with Spark **version 3.0.1** and the corresponding Cassandra Spark Connector **com.datastax.spark:spark-cassandra-connector-assembly_2.12:3.0.0**. Later versions of Spark and/or the Cassandra connector may not function as expected.
 
 ## Sample notebooks
 
diff --git a/articles/cosmos-db/cassandra/spark-ddl-operations.md b/articles/cosmos-db/cassandra/spark-ddl-operations.md
@@ -51,7 +51,10 @@ spark.conf.set("spark.cassandra.connection.keep_alive_ms", "600000000")
 ```
 
 > [!NOTE]
-> If you are using Spark 3.0 or higher, you do not need to install the Cosmos DB helper and connection factory. You should also use `remoteConnectionsPerExecutor` instead of `connections_per_executor_max` for the Spark 3 connector (see above).
+> If you are using Spark 3.0, you do not need to install the Cosmos DB helper and connection factory. You should also use `remoteConnectionsPerExecutor` instead of `connections_per_executor_max` for the Spark 3 connector (see above).
+
+> [!WARNING]
+> The Spark 3 samples shown in this article have been tested with Spark **version 3.0.1** and the corresponding Cassandra Spark Connector **com.datastax.spark:spark-cassandra-connector-assembly_2.12:3.0.0**. Later versions of Spark and/or the Cassandra connector may not function as expected.
 
 ## Keyspace DDL operations
 
diff --git a/articles/cosmos-db/cassandra/spark-delete-operation.md b/articles/cosmos-db/cassandra/spark-delete-operation.md
@@ -47,7 +47,10 @@ spark.conf.set("spark.cassandra.connection.keep_alive_ms", "600000000")
 ```
 
 > [!NOTE]
-> If you are using Spark 3.0 or higher, you do not need to install the Cosmos DB helper and connection factory. You should also use `remoteConnectionsPerExecutor` instead of `connections_per_executor_max` for the Spark 3 connector (see above). You will see that connection related properties are defined within the notebook above. Using the syntax below, connection properties can be defined in this manner without needing to be defined at the cluster level (Spark context initialization). However, when using operations that require spark context (for example, `CassandraConnector(sc)` for `delete` as shown below), connection properties need to be defined at the cluster level.
+> If you are using Spark 3.0, you do not need to install the Cosmos DB helper and connection factory. You should also use `remoteConnectionsPerExecutor` instead of `connections_per_executor_max` for the Spark 3 connector (see above). You will see that connection related properties are defined within the notebook above. Using the syntax below, connection properties can be defined in this manner without needing to be defined at the cluster level (Spark context initialization). However, when using operations that require spark context (for example, `CassandraConnector(sc)` for `delete` as shown below), connection properties need to be defined at the cluster level.
+
+> [!WARNING]
+> The Spark 3 samples shown in this article have been tested with Spark **version 3.0.1** and the corresponding Cassandra Spark Connector **com.datastax.spark:spark-cassandra-connector-assembly_2.12:3.0.0**. Later versions of Spark and/or the Cassandra connector may not function as expected.
 
 ## Sample data generator
 We will use this code fragment to generate sample data:
diff --git a/articles/cosmos-db/cassandra/spark-read-operation.md b/articles/cosmos-db/cassandra/spark-read-operation.md
@@ -48,7 +48,10 @@ spark.conf.set("spark.cassandra.connection.keep_alive_ms", "600000000")
 ```
 
 > [!NOTE]
-> If you are using Spark 3.0 or higher, you do not need to install the Cosmos DB helper and connection factory. You should also use `remoteConnectionsPerExecutor` instead of `connections_per_executor_max` for the Spark 3 connector(see above). You will see that connection related properties are defined within the notebook above. Using the syntax below, connection properties can be defined in this manner without needing to be defined at the cluster level (Spark context initialization).
+> If you are using Spark 3.0, you do not need to install the Cosmos DB helper and connection factory. You should also use `remoteConnectionsPerExecutor` instead of `connections_per_executor_max` for the Spark 3 connector(see above). You will see that connection related properties are defined within the notebook above. Using the syntax below, connection properties can be defined in this manner without needing to be defined at the cluster level (Spark context initialization).
+
+> [!WARNING]
+> The Spark 3 samples shown in this article have been tested with Spark **version 3.0.1** and the corresponding Cassandra Spark Connector **com.datastax.spark:spark-cassandra-connector-assembly_2.12:3.0.0**. Later versions of Spark and/or the Cassandra connector may not function as expected.
 
 ## Dataframe API
 
diff --git a/articles/cosmos-db/cassandra/spark-table-copy-operations.md b/articles/cosmos-db/cassandra/spark-table-copy-operations.md
@@ -49,6 +49,9 @@ spark.conf.set("spark.cassandra.connection.keep_alive_ms", "600000000")
 > [!NOTE]
 > If you are using Spark 3.0 or higher, you do not need to install the Cosmos DB helper and connection factory. You should also use `remoteConnectionsPerExecutor` instead of `connections_per_executor_max` for the Spark 3 connector (see above). You will see that connection related properties are defined within the notebook above. Using the syntax below, connection properties can be defined in this manner without needing to be defined at the cluster level (Spark context initialization).
 
+> [!WARNING]
+> The Spark 3 samples shown in this article have been tested with Spark **version 3.0.1** and the corresponding Cassandra Spark Connector **com.datastax.spark:spark-cassandra-connector-assembly_2.12:3.0.0**. Later versions of Spark and/or the Cassandra connector may not function as expected.
+
 ## Insert sample data 
 ```scala
 val booksDF = Seq(
diff --git a/articles/cosmos-db/cassandra/spark-upsert-operations.md b/articles/cosmos-db/cassandra/spark-upsert-operations.md
@@ -47,7 +47,10 @@ spark.conf.set("spark.cassandra.connection.keep_alive_ms", "600000000")
 ```
 
 > [!NOTE]
-> If you are using Spark 3.0 or higher, you do not need to install the Cosmos DB helper and connection factory. You should also use `remoteConnectionsPerExecutor` instead of `connections_per_executor_max` for the Spark 3 connector (see above). You will see that connection related properties are defined within the notebook above. Using the syntax below, connection properties can be defined in this manner without needing to be defined at the cluster level (Spark context initialization). However, when using operations that require spark context (for example, `CassandraConnector(sc)` for `update` as shown below), connection properties need to be defined at the cluster level.
+> If you are using Spark 3.0, you do not need to install the Cosmos DB helper and connection factory. You should also use `remoteConnectionsPerExecutor` instead of `connections_per_executor_max` for the Spark 3 connector (see above). You will see that connection related properties are defined within the notebook above. Using the syntax below, connection properties can be defined in this manner without needing to be defined at the cluster level (Spark context initialization). However, when using operations that require spark context (for example, `CassandraConnector(sc)` for `update` as shown below), connection properties need to be defined at the cluster level.
+
+> [!WARNING]
+> The Spark 3 samples shown in this article have been tested with Spark **version 3.0.1** and the corresponding Cassandra Spark Connector **com.datastax.spark:spark-cassandra-connector-assembly_2.12:3.0.0**. Later versions of Spark and/or the Cassandra connector may not function as expected.
 
 ## Dataframe API