Skip to content

Commit 3d94585

Browse files
authored
Merge pull request #259814 from seesharprun/cosmos-build-validation-fixes
Cosmos DB | Fix build validation issues
2 parents 1ec616d + 0e41da3 commit 3d94585

File tree

3 files changed

+15
-17
lines changed

3 files changed

+15
-17
lines changed

articles/cosmos-db/cassandra/spark-databricks.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -45,7 +45,7 @@ This article details how to work with Azure Cosmos DB for Apache Cassandra from
4545

4646
* **Cassandra Spark connector:** - To integrate Azure Cosmos DB for Apache Cassandra with Spark, the Cassandra connector should be attached to the Azure Databricks cluster. To attach the cluster:
4747

48-
* Review the Databricks runtime version, the Spark version. Then find the [maven coordinates](https://mvnrepository.com/artifact/com.datastax.spark/spark-cassandra-connector-assembly) that are compatible with the Cassandra Spark connector, and attach it to the cluster. See ["Upload a Maven package or Spark package"](https://docs.databricks.com/user-guide/libraries.html) article to attach the connector library to the cluster. We recommend selecting Databricks runtime version 10.4 LTS, which supports Spark 3.2.1. To add the Apache Spark Cassandra Connector, your cluster, select **Libraries** > **Install New** > **Maven**, and then add `com.datastax.spark:spark-cassandra-connector-assembly_2.12:3.2.0` in Maven coordinates. If using Spark 2.x, we recommend an environment with Spark version 2.4.5, using spark connector at maven coordinates `com.datastax.spark:spark-cassandra-connector_2.11:2.4.3`.
48+
* Review the Databricks runtime version, the Spark version. Then find the [maven coordinates](https://mvnrepository.com/artifact/com.datastax.spark/spark-cassandra-connector-assembly) that are compatible with the Cassandra Spark connector, and attach it to the cluster. See ["Upload a Maven package or Spark package"](https://docs.databricks.com/libraries) article to attach the connector library to the cluster. We recommend selecting Databricks runtime version 10.4 LTS, which supports Spark 3.2.1. To add the Apache Spark Cassandra Connector, your cluster, select **Libraries** > **Install New** > **Maven**, and then add `com.datastax.spark:spark-cassandra-connector-assembly_2.12:3.2.0` in Maven coordinates. If using Spark 2.x, we recommend an environment with Spark version 2.4.5, using spark connector at maven coordinates `com.datastax.spark:spark-cassandra-connector_2.11:2.4.3`.
4949

5050
* **Azure Cosmos DB for Apache Cassandra-specific library:** - If you're using Spark 2.x, a custom connection factory is required to configure the retry policy from the Cassandra Spark connector to Azure Cosmos DB for Apache Cassandra. Add the `com.microsoft.azure.cosmosdb:azure-cosmos-cassandra-spark-helper:1.2.0`[maven coordinates](https://search.maven.org/artifact/com.microsoft.azure.cosmosdb/azure-cosmos-cassandra-spark-helper/1.2.0/jar) to attach the library to the cluster.
5151

articles/cosmos-db/nosql/migrate-relational-data.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -119,7 +119,7 @@ We can also use Spark in [Azure Databricks](https://azure.microsoft.com/services
119119
> [!NOTE]
120120
> For clarity and simplicity, the code snippets include dummy database passwords explicitly inline, but you should ideally use Azure Databricks secrets.
121121
122-
First, we create and attach the required [SQL connector](/connectors/sql/) and [Azure Cosmos DB connector](https://docs.databricks.com/data/data-sources/azure/cosmosdb-connector.html) libraries to our Azure Databricks cluster. Restart the cluster to make sure libraries are loaded.
122+
First, we create and attach the required [SQL connector](/connectors/sql/) and [Azure Cosmos DB connector](/azure/databricks/external-data/cosmosdb-connector) libraries to our Azure Databricks cluster. Restart the cluster to make sure libraries are loaded.
123123

124124
:::image type="content" source="./media/migrate-relational-data/databricks1.png" alt-text="Screenshot that shows where to create and attach the required SQL connector and Azure Cosmos DB connector libraries to our Azure Databricks cluster.":::
125125

articles/cosmos-db/postgresql/concepts-sharding-models.md

Lines changed: 13 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -51,18 +51,16 @@ Drawbacks:
5151

5252
## Sharding tradeoffs
5353

54-
<br />
55-
56-
|| Schema-based sharding | Row-based sharding|
57-
|---|---|---|
58-
|Multi-tenancy model|Separate schema per tenant|Shared tables with tenant ID columns|
59-
|Citus version|12.0+|All versions|
60-
|Extra steps compared to vanilla PostgreSQL|None, only a config change|Use create_distributed_table on each table to distribute & colocate tables by tenant ID|
61-
|Number of tenants|1-10k|1-1 M+|
62-
|Data modeling requirement|No foreign keys across distributed schemas|Need to include a tenant ID column (a distribution column, also known as a sharding key) in each table, and in primary keys, foreign keys|
63-
|SQL requirement for single node queries|Use a single distributed schema per query|Joins and WHERE clauses should include tenant_id column|
64-
|Parallel cross-tenant queries|No|Yes|
65-
|Custom table definitions per tenant|Yes|No|
66-
|Access control|Schema permissions|Schema permissions|
67-
|Data sharing across tenants|Yes, using reference tables (in a separate schema)|Yes, using reference tables|
68-
|Tenant to shard isolation|Every tenant has its own shard group by definition|Can give specific tenant IDs their own shard group via isolate_tenant_to_new_shard|
54+
| | Schema-based sharding | Row-based sharding |
55+
| --- | --- | --- |
56+
| **Multi-tenancy model** | Separate schema per tenant | Shared tables with tenant ID columns |
57+
| **Citus version** | 12.0+ | All versions |
58+
| **Extra steps compared to vanilla PostgreSQL** | None, only a config change | Use create_distributed_table on each table to distribute & colocate tables by tenant ID |
59+
| **Number of tenants** | 1-10k | 1-1 M+ |
60+
| **Data modeling requirement** | No foreign keys across distributed schemas | Need to include a tenant ID column (a distribution column, also known as a sharding key) in each table, and in primary keys, foreign keys |
61+
| **SQL requirement for single node queries** | Use a single distributed schema per query | Joins and WHERE clauses should include tenant_id column |
62+
| **Parallel cross-tenant queries** | No | Yes |
63+
| **Custom table definitions per tenant** | Yes | No |
64+
| **Access control** | Schema permissions | Schema permissions |
65+
| **Data sharing across tenants** | Yes, using reference tables (in a separate schema) | Yes, using reference tables |
66+
| **Tenant to shard isolation** | Every tenant has its own shard group by definition | Can give specific tenant IDs their own shard group via isolate_tenant_to_new_shard |

0 commit comments

Comments
 (0)