You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/data-factory/transform-data-databricks-jar.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -121,13 +121,13 @@ For more information, see the [Databricks documentation](/azure/databricks/dev-t
121
121
122
122
1.[Use the Databricks workspace UI](/azure/databricks/libraries/cluster-libraries#install-a-library-on-a-cluster)
123
123
124
-
2. To obtain the dbfs path of the library added using UI, you can use [Databricks CLI](/azure/databricks/dev-tools/cli/fs-commands#list-the-contents-of-a-directory).
124
+
2. To obtain the dbfs path of the library added using UI, you can use [Databricks CLI](/azure/databricks/dev-tools/cli/reference/fs-commands#databricks-fs-ls).
125
125
126
126
Typically the Jar libraries are stored under dbfs:/FileStore/jars while using the UI. You can list all through the CLI: *databricks fs ls dbfs:/FileStore/job-jars*
127
127
128
128
### Or you can use the Databricks CLI:
129
129
130
-
1. Follow [Copy the library using Databricks CLI](/azure/databricks/dev-tools/cli/fs-commands#copy-a-directory-or-a-file)
130
+
1. Follow [Copy the library using Databricks CLI](/azure/databricks/dev-tools/cli/reference/fs-commands#databricks-fs-cp)
131
131
132
132
2. Use Databricks CLI [(installation steps)](/azure/databricks/dev-tools/cli/commands#compute-commands)
Copy file name to clipboardExpand all lines: articles/data-factory/transform-data-databricks-notebook.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -140,13 +140,13 @@ In certain cases, you might require to pass back certain values from notebook ba
140
140
141
141
1.[Use the Databricks workspace UI](/azure/databricks/libraries/cluster-libraries#install-a-library-on-a-cluster)
142
142
143
-
2. To obtain the dbfs path of the library added using UI, you can use [Databricks CLI](/azure/databricks/dev-tools/cli/fs-commands#list-the-contents-of-a-directory).
143
+
2. To obtain the dbfs path of the library added using UI, you can use [Databricks CLI](/azure/databricks/dev-tools/cli/reference/fs-commands#databricks-fs-ls).
144
144
145
145
Typically the Jar libraries are stored under dbfs:/FileStore/jars while using the UI. You can list all through the CLI: *databricks fs ls dbfs:/FileStore/job-jars*
146
146
147
147
### Or you can use the Databricks CLI:
148
148
149
-
1. Follow [Copy the library using Databricks CLI](/azure/databricks/dev-tools/cli/fs-commands#copy-a-directory-or-a-file)
149
+
1. Follow [Copy the library using Databricks CLI](/azure/databricks/dev-tools/cli/reference/fs-commands#databricks-fs-cp)
150
150
151
151
2. Use Databricks CLI [(installation steps)](/azure/databricks/dev-tools/cli/commands#compute-commands)
Copy file name to clipboardExpand all lines: articles/synapse-analytics/synapse-link/how-to-query-analytical-store-spark-3.md
+14-14Lines changed: 14 additions & 14 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,12 +1,12 @@
1
1
---
2
2
title: Interact with Azure Cosmos DB using Apache Spark 3 in Azure Synapse Link
3
3
description: How to interact with Azure Cosmos DB using Apache Spark 3 in Azure Synapse Link
4
-
author: Rodrigossz
4
+
author: im-microsoft
5
5
ms.service: azure-synapse-analytics
6
6
ms.topic: quickstart
7
7
ms.subservice: synapse-link
8
8
ms.date: 03/04/2025
9
-
ms.author: rosouz
9
+
ms.author: imotiwala
10
10
ms.custom: cosmos-db, mode-other
11
11
---
12
12
@@ -18,24 +18,24 @@ The following capabilities are supported while interacting with Azure Cosmos DB:
18
18
* Synapse Apache Spark 3 allows you to analyze data in your Azure Cosmos DB containers that are enabled with Azure Synapse Link in near real-time without impacting the performance of your transactional workloads. The following two options are available to query the Azure Cosmos DB [analytical store](/azure/cosmos-db/analytical-store-introduction) from Spark:
19
19
+ Load to Spark DataFrame
20
20
+ Create Spark table
21
-
* Synapse Apache Spark also allows you to ingest data into Azure Cosmos DB. It is important to note that data is always ingested into Azure Cosmos DB containers through the transactional store. When Synapse Link is enabled, any new inserts, updates, and deletes are then automatically synced to the analytical store.
21
+
* Synapse Apache Spark also allows you to ingest data into Azure Cosmos DB. It's important to note that data is always ingested into Azure Cosmos DB containers through the transactional store. When Azure Synapse Link is enabled, any new inserts, updates, and deletes are then automatically synced to the analytical store.
22
22
* Synapse Apache Spark also supports Spark structured streaming with Azure Cosmos DB as a source and a sink.
23
23
24
-
The following sections walk you through the syntax. You can also checkout the Learn module on how to [Query Azure Cosmos DB with Apache Spark for Azure Synapse Analytics](/training/modules/query-azure-cosmos-db-with-apache-spark-for-azure-synapse-analytics/). Gestures in Azure Synapse Analytics workspace are designed to provide an easy out-of-the-box experience to get started. Gestures are visible when you right-click on an Azure Cosmos DB container in the **Data** tab of the Synapse workspace. With gestures, you can quickly generate code and tailor it to your needs. Gestures are also perfect for discovering data with a single click.
24
+
The following sections walk you through the syntax. You can also check out the Learn module on how to [Query Azure Cosmos DB with Apache Spark for Azure Synapse Analytics](/training/modules/query-azure-cosmos-db-with-apache-spark-for-azure-synapse-analytics/). Gestures in Azure Synapse Analytics workspace are designed to provide an easy out-of-the-box experience to get started. Gestures are visible when you right-click on an Azure Cosmos DB container in the **Data** tab of the Synapse workspace. With gestures, you can quickly generate code and tailor it to your needs. Gestures are also perfect for discovering data with a single click.
25
25
26
26
> [!IMPORTANT]
27
27
> You should be aware of some constraints in the analytical schema that could lead to the unexpected behavior in data loading operations.
28
-
> As an example, only first 1,000 properties from transactional schema are available in the analytical schema, properties with spaces are not available, etc. If you are experiencing some unexpected results, check the [analytical store schema constraints](/azure/cosmos-db/analytical-store-introduction#schema-constraints) for more details.
28
+
> As an example, only first 1,000 properties from transactional schema are available in the analytical schema, properties with spaces aren't available, etc. If you're experiencing some unexpected results, check the [analytical store schema constraints](/azure/cosmos-db/analytical-store-introduction#schema-constraints) for more details.
29
29
30
30
## Query Azure Cosmos DB analytical store
31
31
32
32
Customers can load analytical store data to Spark DataFrames or create Spark tables.
33
33
34
-
The difference in experience is around whether underlying data changes in the Azure Cosmos DB container should be automatically reflected in the analysis performed in Spark. When Spark DataFrames are registered, or a Spark table is created, Spark fetches analytical store metadata for efficient pushdown. It is important to note that since Spark follows a lazy evaluation policy. You need to take action to fecth the last snapshot of the data in Spark DataFrames or SparkSQL queries.
34
+
The difference in experience is around whether underlying data changes in the Azure Cosmos DB container should be automatically reflected in the analysis performed in Spark. When Spark DataFrames are registered, or a Spark table is created, Spark fetches analytical store metadata for efficient pushdown. It's important to note that since Spark follows a lazy evaluation policy. You need to take action to fetch the last snapshot of the data in Spark DataFrames or SparkSQL queries.
35
35
36
36
In the case of **loading to Spark DataFrame**, the fetched metadata is cached through the lifetime of the Spark session and hence subsequent actions invoked on the DataFrame are evaluated against the snapshot of the analytical store at the time of DataFrame creation.
37
37
38
-
On the other hand, in the case of **creating a Spark table**, the metadata of the analytical store state is not cached in Spark and is reloaded on every SparkSQL query execution against the Spark table.
38
+
On the other hand, in the case of **creating a Spark table**, the metadata of the analytical store state isn't cached in Spark and is reloaded on every SparkSQL query execution against the Spark table.
39
39
40
40
To conclude, you can choose between loading a snapshot to Spark DataFrame or querying a Spark table for the latest snapshot.
41
41
@@ -47,7 +47,7 @@ To conclude, you can choose between loading a snapshot to Spark DataFrame or que
47
47
48
48
## Authentication
49
49
50
-
Now Spark 3.x customers can authenticate to Azure Cosmos DB analytical store using trusted identities access tokens or database account keys. Tokens are more secure as they are short lived, and assigned to the required permission using Cosmos DB RBAC.
50
+
Now Spark 3.x customers can authenticate to Azure Cosmos DB analytical store using trusted identities access tokens or database account keys. Tokens are more secure as they're short lived, and assigned to the required permission using Cosmos DB RBAC.
51
51
52
52
The connector now supports two auth types, `MasterKey` and `AccessToken` for the `spark.cosmos.auth.type` property.
53
53
@@ -85,13 +85,13 @@ df.show(10)
85
85
```
86
86
87
87
> [!NOTE]
88
-
> Azure Cosmos DB's Synapse Link Spark connector does not support Managed Identity.
88
+
> Azure Cosmos DB's Azure Synapse Link Spark connector doesn't support Managed Identity.
89
89
90
90
#### Access token authentication requires role assignment
91
91
92
92
To use the access token approach, you need to generate access tokens. Since access tokens are associated with Azure identities, correct role-based access control (RBAC) must be assigned to the identity. The role assignment is on data plane level, and you must have minimum control plane permissions to perform the role assignment.
93
93
94
-
The Identity Access Management (IAM) role assignments from Azure portal are on control plane level and don't affect the role assignments on data plane. Data plane role assignments are only available via Azure CLI. The `readAnalytics` action is required to read data from analytical store in Cosmos DB and is not part of any predefined roles. As such we must create a custom role definition. In addition to the `readAnalytics` action, also add the actions required for Data Reader. Create a JSON file with the following content and name it role_definition.json
94
+
The Identity Access Management (IAM) role assignments from Azure portal are on control plane level and don't affect the role assignments on data plane. Data plane role assignments are only available via Azure CLI. The `readAnalytics` action is required to read data from analytical store in Cosmos DB and isn't part of any predefined roles. As such we must create a custom role definition. In addition to the `readAnalytics` action, also add the actions required for Data Reader. Create a JSON file with the following content and name it role_definition.json
95
95
96
96
```JSON
97
97
{
@@ -113,7 +113,7 @@ The Identity Access Management (IAM) role assignments from Azure portal are on c
- Set the default subscription which has your Cosmos DB account: `az account set --subscription <name or id>`
116
+
- Set the default subscription, which has your Cosmos DB account: `az account set --subscription <name or id>`
117
117
- Create the role definition in the desired Cosmos DB account: `az cosmosdb sql role definition create --account-name <cosmos-account-name> --resource-group <resource-group-name> --body @role_definition.json`
118
118
- Copy over the role `definition id` returned: `/subscriptions/<subscription-id>/resourceGroups/<resource-group-name>/providers/Microsoft.DocumentDB/databaseAccounts/< cosmos-account-name >/sqlRoleDefinitions/<a-random-generated-guid>`
119
119
- Get the principal ID of the identity that you want to assign the role to. The identity could be an Azure app registration, a virtual machine, or any other supported Azure resource. Assign the role to the principal using: `az cosmosdb sql role assignment create --account-name "<cosmos-account-name>" --resource-group "<resource-group>" --scope "/" --principal-id "<principal-id-of-identity>" --role-definition-id "<role-definition-id-from-previous-step>"`
@@ -171,7 +171,7 @@ val df_olap = spark.read.format("cosmos.olap").
171
171
172
172
### Create Spark table
173
173
174
-
In this example, you create a Spark table that points the Azure Cosmos DB analytical store. You can then perform additional analysis by invoking SparkSQL queries against the table. This operation doesn't impact transactional store or incur data movement. If you decide to delete this Spark table, the underlying Azure Cosmos DB container and the corresponding analytical store will not be affected.
174
+
In this example, you create a Spark table that points the Azure Cosmos DB analytical store. You can then perform more analysis by invoking SparkSQL queries against the table. This operation doesn't impact transactional store or incur data movement. If you decide to delete this Spark table, the underlying Azure Cosmos DB container and the corresponding analytical store won't be affected.
175
175
176
176
This scenario is convenient to reuse Spark tables through third-party tools and provide accessibility to the underlying data for the run-time.
In this gesture, you use Spark Streaming capability to load data from a container into a dataframe. The data is stored in the primary data lake account (and file system) you connected to the workspace.
224
224
> [!NOTE]
225
-
> If you are looking to reference external libraries in Synapse Apache Spark, learn more [here](../spark/apache-spark-azure-portal-add-libraries.md). For instance, if you are looking to ingest a Spark DataFrame to a container of Azure Cosmos DB for MongoDB, you can use the MongoDB connector for Spark [here](https://docs.mongodb.com/spark-connector/master/).
225
+
> If you're looking to reference external libraries in Synapse Apache Spark, learn more [here](../spark/apache-spark-azure-portal-add-libraries.md). For instance, if you're looking to ingest a Spark DataFrame to a container of Azure Cosmos DB for MongoDB, you can use the MongoDB connector for Spark [here](https://docs.mongodb.com/spark-connector/master/).
226
226
227
227
## Load streaming DataFrame from Azure Cosmos DB container
228
228
In this example, you use Spark's structured streaming to load data from an Azure Cosmos DB container into a Spark streaming DataFrame, using the change feed functionality in Azure Cosmos DB. The checkpoint data used by Spark will be stored in the primary data lake account (and file system) that you connected to the workspace.
@@ -294,5 +294,5 @@ query.awaitTermination()
294
294
295
295
*[Samples to get started with Azure Synapse Link on GitHub](https://aka.ms/cosmosdb-synapselink-samples)
296
296
*[Learn what is supported in Azure Synapse Link for Azure Cosmos DB](./concept-synapse-link-cosmos-db-support.md)
297
-
*[Connect to Synapse Link for Azure Cosmos DB](../quickstart-connect-synapse-link-cosmos-db.md)
297
+
*[Connect to Azure Synapse Link for Azure Cosmos DB](../quickstart-connect-synapse-link-cosmos-db.md)
298
298
* Check out the Learn module on how to [Query Azure Cosmos DB with Apache Spark for Azure Synapse Analytics](/training/modules/query-azure-cosmos-db-with-apache-spark-for-azure-synapse-analytics/).
0 commit comments