Skip to content

Commit bbbec12

Browse files
authored
Merge pull request #275782 from paulth1/three-spark-connector-articles
[AQ] edit pass: Three spark connector articles
2 parents e0ccd08 + 195bf91 commit bbbec12

File tree

3 files changed

+106
-110
lines changed

3 files changed

+106
-110
lines changed

articles/cosmos-db/nosql/how-to-spark-service-principal.md

Lines changed: 30 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -1,51 +1,51 @@
11
---
22
title: Use a service principal with Spark
33
titleSuffix: Azure Cosmos DB for NoSQL
4-
description: Use a Microsoft Entra service principal to authenticate to Azure Cosmos DB for NoSQL from Spark.
4+
description: Learn how to use a Microsoft Entra service principal to authenticate to Azure Cosmos DB for NoSQL from Spark.
55
author: seesharprun
66
ms.author: sidandrews
77
ms.service: cosmos-db
88
ms.subservice: nosql
99
ms.topic: how-to
1010
ms.date: 04/01/2024
1111
zone_pivot_groups: programming-languages-spark-all-minus-sql-r-csharp
12-
#CustomerIntent: As a data scientist, I want to connect to Azure Cosmos DB for NoSQL using Spark and a service principal, so that I can avoid using connection strings.
12+
#CustomerIntent: As a data scientist, I want to connect to Azure Cosmos DB for NoSQL by using Spark and a service principal so that I can avoid using connection strings.
1313
---
1414

1515
# Use a service principal with the Spark 3 connector for Azure Cosmos DB for NoSQL
1616

17-
In this article, you learn how to create a Microsoft Entra application and service principal that can be used with the role-based access control. You can then use this service principal to connect to an Azure Cosmos DB for NoSQL account from Spark 3.
17+
In this article, you learn how to create a Microsoft Entra application and service principal that can be used with role-based access control. You can then use this service principal to connect to an Azure Cosmos DB for NoSQL account from Spark 3.
1818

1919
## Prerequisites
2020

2121
- An existing Azure Cosmos DB for NoSQL account.
2222
- If you have an existing Azure subscription, [create a new account](how-to-create-account.md?tabs=azure-portal).
2323
- No Azure subscription? You can [try Azure Cosmos DB free](../try-free.md) with no credit card required.
2424
- An existing Azure Databricks workspace.
25-
- Registered Microsoft Entra application and service principal
26-
- If you don't have a service principal and application, [register an application using the Azure portal](/entra/identity-platform/howto-create-service-principal-portal).
25+
- Registered Microsoft Entra application and service principal.
26+
- If you don't have a service principal and application, [register an application by using the Azure portal](/entra/identity-platform/howto-create-service-principal-portal).
2727

28-
## Create secret and record credentials
28+
## Create a secret and record credentials
2929

30-
In this section we will create a client secret and record the value for use later.
30+
In this section, you create a client secret and record the value for use later.
3131

32-
1. Open the Azure portal (<https://portal.azure.com>).
32+
1. Open the [Azure portal](<https://portal.azure.com>).
3333

34-
1. Navigate to your existing Microsoft Entra application.
34+
1. Go to your existing Microsoft Entra application.
3535

36-
1. Navigate to the **Certificates & secrets** page. Then, create a new secret. Save the **Client Secret** value to use later in this guide.
36+
1. Go to the **Certificates & secrets** page. Then, create a new secret. Save the **Client Secret** value to use later in this article.
3737

38-
1. Navigate to the **Overview** page. Locate and record the values for **Application (client) ID**, **Object ID**, and **Directory (tenant) ID**. You also use these values later in this guide.
38+
1. Go to the **Overview** page. Locate and record the values for **Application (client) ID**, **Object ID**, and **Directory (tenant) ID**. You also use these values later in this article.
3939

40-
1. Navigate to your existing Azure Cosmos DB for NoSQL account.
40+
1. Go to your existing Azure Cosmos DB for NoSQL account.
4141

42-
1. Record the **URI** value in the **Overview** page. Also record the **Subscription ID** and **Resource Group** values. You' use these values too later in this guide.
42+
1. Record the **URI** value on the **Overview** page. Also record the **Subscription ID** and **Resource Group** values. You use these values later in this article.
4343

44-
## Create definition and assignment
44+
## Create a definition and an assignment
4545

46-
In this section we will create a Microsoft Entra ID role definition and assign that role with permissions to read and write items in the containers.
46+
In this section, you create a Microsoft Entra ID role definition. Then you assign that role with permissions to read and write items in the containers.
4747

48-
1. Create a role using the `az role definition create` command. Pass in the Azure Cosmos DB for NoSQL account name and resource group, followed by a body of JSON that defines the custom role. The role is also scoped to the account level using `/`. Ensure that you provide a unique name for your role using the `RoleName` property of the request body.
48+
1. Create a role by using the `az role definition create` command. Pass in the Azure Cosmos DB for NoSQL account name and resource group, followed by a body of JSON that defines the custom role. The role is also scoped to the account level by using `/`. Ensure that you provide a unique name for your role by using the `RoleName` property of the request body.
4949

5050
```azurecli
5151
az cosmosdb sql role definition create \
@@ -94,7 +94,7 @@ In this section we will create a Microsoft Entra ID role definition and assign t
9494
]
9595
```
9696
97-
1. Use `az cosmosdb sql role assignment create` to create a role assignment. Replace the`<aad-principal-id>` with the **Object ID** you recorded earlier in this guide. Also, replace `<role-definition-id>` with the `id` value fetched from running the `az cosmosdb sql role definition list` command in a previous step.
97+
1. Use `az cosmosdb sql role assignment create` to create a role assignment. Replace `<aad-principal-id>` with the **Object ID** you recorded earlier in this article. Also, replace `<role-definition-id>` with the `id` value fetched from running the `az cosmosdb sql role definition list` command in a previous step.
9898
9999
```azurecli
100100
az cosmosdb sql role assignment create \
@@ -105,26 +105,26 @@ In this section we will create a Microsoft Entra ID role definition and assign t
105105
--role-definition-id "<role-definition-id>"
106106
```
107107
108-
## Use service principal
108+
## Use a service principal
109109
110-
Now that you created a Microsoft Entra application and service principal, created a custom role, and assigned that role permissions to your Azure Cosmos DB for NoSQL account, you should be able to run a notebook.
110+
Now that you've created a Microsoft Entra application and service principal, created a custom role, and assigned that role permissions to your Azure Cosmos DB for NoSQL account, you should be able to run a notebook.
111111
112112
1. Open your Azure Databricks workspace.
113113
114114
1. In the workspace interface, create a new **cluster**. Configure the cluster with these settings, at a minimum:
115115
116-
| | **Value** |
116+
| Version | Value |
117117
| --- | --- |
118-
| **Runtime version** | `13.3 LTS (Scala 2.12, Spark 3.4.1)` |
118+
| Runtime version | `13.3 LTS (Scala 2.12, Spark 3.4.1)` |
119119
120-
1. Use the workspace interface to search for **Maven** packages from **Maven Central** with a **Group Id** of `com.azure.cosmos.spark`. Install the package specific for Spark 3.4 with an **Artifact Id** prefixed with `azure-cosmos-spark_3-4` to the cluster.
120+
1. Use the workspace interface to search for **Maven** packages from **Maven Central** with a **Group ID** of `com.azure.cosmos.spark`. Install the package specifically for Spark 3.4 with an **Artifact ID** prefixed with `azure-cosmos-spark_3-4` to the cluster.
121121
122122
1. Finally, create a new **notebook**.
123123
124124
> [!TIP]
125-
> By default, the notebook will be attached to the recently created cluster.
125+
> By default, the notebook is attached to the recently created cluster.
126126
127-
1. Within the notebook, set Cosmos DB Spark Connector configuration settings for NoSQL account endpoint, database name, and container name. Use the **Subscription ID**, **Resource Group**, **Application (client) ID**, **Directory (tenant) ID**, and **Client Secret** values recorded earlier in this guide.
127+
1. Within the notebook, set Azure Cosmos DB Spark connector configuration settings for the NoSQL account endpoint, database name, and container name. Use the **Subscription ID**, **Resource Group**, **Application (client) ID**, **Directory (tenant) ID**, and **Client Secret** values recorded earlier in this article.
128128
129129
::: zone pivot="programming-language-python"
130130
@@ -164,7 +164,7 @@ Now that you created a Microsoft Entra application and service principal, create
164164
165165
::: zone-end
166166
167-
1. Configure the Catalog API to manage API for NoSQL resources using Spark.
167+
1. Configure the Catalog API to manage API for NoSQL resources by using Spark.
168168
169169
::: zone pivot="programming-language-python"
170170
@@ -198,7 +198,7 @@ Now that you created a Microsoft Entra application and service principal, create
198198
199199
::: zone-end
200200
201-
1. Create a new database using `CREATE DATABASE IF NOT EXISTS`. Ensure that you provide your database name.
201+
1. Create a new database by using `CREATE DATABASE IF NOT EXISTS`. Ensure that you provide your database name.
202202
203203
::: zone pivot="programming-language-python"
204204
@@ -218,7 +218,7 @@ Now that you created a Microsoft Entra application and service principal, create
218218
219219
::: zone-end
220220
221-
1. Create a new container using database name, container name, partition key path, and throughput values that you specify.
221+
1. Create a new container by using the database name, container name, partition key path, and throughput values that you specify.
222222
223223
::: zone pivot="programming-language-python"
224224
@@ -238,7 +238,7 @@ Now that you created a Microsoft Entra application and service principal, create
238238
239239
::: zone-end
240240
241-
1. Create a sample data set.
241+
1. Create a sample dataset.
242242
243243
::: zone pivot="programming-language-python"
244244
@@ -264,7 +264,7 @@ Now that you created a Microsoft Entra application and service principal, create
264264
265265
::: zone-end
266266
267-
1. Use `spark.createDataFrame` and the previously saved OLTP configuration to add sample data to the target container.
267+
1. Use `spark.createDataFrame` and the previously saved online transaction processing (OLTP) configuration to add sample data to the target container.
268268
269269
::: zone pivot="programming-language-python"
270270
@@ -297,7 +297,7 @@ Now that you created a Microsoft Entra application and service principal, create
297297
::: zone-end
298298
299299
> [!TIP]
300-
> In this quickstart example credentials are assigned to variables in clear-text, but for security we recommend the usage of secrets. For more information on configuring secrets, see [add secrets to your Spark configuration](/azure/databricks/security/secrets/secrets#read-a-secret).
300+
> In this quickstart example, credentials are assigned to variables in clear text. For security, we recommend that you use secrets. For more information on how to configure secrets, see [Add secrets to your Spark configuration](/azure/databricks/security/secrets/secrets#read-a-secret).
301301
302302
## Related content
303303

0 commit comments

Comments
 (0)