You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/cosmos-db/cassandra/spark-databricks.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -45,7 +45,7 @@ This article details how to work with Azure Cosmos DB for Apache Cassandra from
45
45
46
46
***Cassandra Spark connector:** - To integrate Azure Cosmos DB for Apache Cassandra with Spark, the Cassandra connector should be attached to the Azure Databricks cluster. To attach the cluster:
47
47
48
-
* Review the Databricks runtime version, the Spark version. Then find the [maven coordinates](https://mvnrepository.com/artifact/com.datastax.spark/spark-cassandra-connector-assembly) that are compatible with the Cassandra Spark connector, and attach it to the cluster. See ["Upload a Maven package or Spark package"](https://docs.databricks.com/user-guide/libraries.html) article to attach the connector library to the cluster. We recommend selecting Databricks runtime version 10.4 LTS, which supports Spark 3.2.1. To add the Apache Spark Cassandra Connector, your cluster, select **Libraries** > **Install New** > **Maven**, and then add `com.datastax.spark:spark-cassandra-connector-assembly_2.12:3.2.0` in Maven coordinates. If using Spark 2.x, we recommend an environment with Spark version 2.4.5, using spark connector at maven coordinates `com.datastax.spark:spark-cassandra-connector_2.11:2.4.3`.
48
+
* Review the Databricks runtime version, the Spark version. Then find the [maven coordinates](https://mvnrepository.com/artifact/com.datastax.spark/spark-cassandra-connector-assembly) that are compatible with the Cassandra Spark connector, and attach it to the cluster. See ["Upload a Maven package or Spark package"](https://docs.databricks.com/libraries) article to attach the connector library to the cluster. We recommend selecting Databricks runtime version 10.4 LTS, which supports Spark 3.2.1. To add the Apache Spark Cassandra Connector, your cluster, select **Libraries** > **Install New** > **Maven**, and then add `com.datastax.spark:spark-cassandra-connector-assembly_2.12:3.2.0` in Maven coordinates. If using Spark 2.x, we recommend an environment with Spark version 2.4.5, using spark connector at maven coordinates `com.datastax.spark:spark-cassandra-connector_2.11:2.4.3`.
49
49
50
50
***Azure Cosmos DB for Apache Cassandra-specific library:** - If you're using Spark 2.x, a custom connection factory is required to configure the retry policy from the Cassandra Spark connector to Azure Cosmos DB for Apache Cassandra. Add the `com.microsoft.azure.cosmosdb:azure-cosmos-cassandra-spark-helper:1.2.0`[maven coordinates](https://search.maven.org/artifact/com.microsoft.azure.cosmosdb/azure-cosmos-cassandra-spark-helper/1.2.0/jar) to attach the library to the cluster.
Copy file name to clipboardExpand all lines: articles/cosmos-db/nosql/migrate-relational-data.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -119,7 +119,7 @@ We can also use Spark in [Azure Databricks](https://azure.microsoft.com/services
119
119
> [!NOTE]
120
120
> For clarity and simplicity, the code snippets include dummy database passwords explicitly inline, but you should ideally use Azure Databricks secrets.
121
121
122
-
First, we create and attach the required [SQL connector](/connectors/sql/) and [Azure Cosmos DB connector](https://docs.databricks.com/data/data-sources/azure/cosmosdb-connector.html) libraries to our Azure Databricks cluster. Restart the cluster to make sure libraries are loaded.
122
+
First, we create and attach the required [SQL connector](/connectors/sql/) and [Azure Cosmos DB connector](/azure/databricks/external-data/cosmosdb-connector) libraries to our Azure Databricks cluster. Restart the cluster to make sure libraries are loaded.
123
123
124
124
:::image type="content" source="./media/migrate-relational-data/databricks1.png" alt-text="Screenshot that shows where to create and attach the required SQL connector and Azure Cosmos DB connector libraries to our Azure Databricks cluster.":::
Copy file name to clipboardExpand all lines: articles/cosmos-db/postgresql/concepts-sharding-models.md
+13-15Lines changed: 13 additions & 15 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -51,18 +51,16 @@ Drawbacks:
51
51
52
52
## Sharding tradeoffs
53
53
54
-
<br />
55
-
56
-
|| Schema-based sharding | Row-based sharding|
57
-
|---|---|---|
58
-
|Multi-tenancy model|Separate schema per tenant|Shared tables with tenant ID columns|
59
-
|Citus version|12.0+|All versions|
60
-
|Extra steps compared to vanilla PostgreSQL|None, only a config change|Use create_distributed_table on each table to distribute & colocate tables by tenant ID|
61
-
|Number of tenants|1-10k|1-1 M+|
62
-
|Data modeling requirement|No foreign keys across distributed schemas|Need to include a tenant ID column (a distribution column, also known as a sharding key) in each table, and in primary keys, foreign keys|
63
-
|SQL requirement for single node queries|Use a single distributed schema per query|Joins and WHERE clauses should include tenant_id column|
|Data sharing across tenants|Yes, using reference tables (in a separate schema)|Yes, using reference tables|
68
-
|Tenant to shard isolation|Every tenant has its own shard group by definition|Can give specific tenant IDs their own shard group via isolate_tenant_to_new_shard|
54
+
|| Schema-based sharding | Row-based sharding |
55
+
| --- | --- | --- |
56
+
|**Multi-tenancy model**| Separate schema per tenant | Shared tables with tenant ID columns |
57
+
|**Citus version**| 12.0+ | All versions |
58
+
|**Extra steps compared to vanilla PostgreSQL**| None, only a config change | Use create_distributed_table on each table to distribute & colocate tables by tenant ID |
59
+
|**Number of tenants**| 1-10k | 1-1 M+ |
60
+
|**Data modeling requirement**| No foreign keys across distributed schemas | Need to include a tenant ID column (a distribution column, also known as a sharding key) in each table, and in primary keys, foreign keys |
61
+
|**SQL requirement for single node queries**| Use a single distributed schema per query | Joins and WHERE clauses should include tenant_id column |
62
+
|**Parallel cross-tenant queries**| No | Yes |
63
+
|**Custom table definitions per tenant**| Yes | No |
|**Data sharing across tenants**| Yes, using reference tables (in a separate schema) | Yes, using reference tables |
66
+
|**Tenant to shard isolation**| Every tenant has its own shard group by definition | Can give specific tenant IDs their own shard group via isolate_tenant_to_new_shard |
0 commit comments