Skip to content

Commit 93191bc

Browse files
authored
Update how-to-query-analytical-store-spark-3.md
1 parent 235ac6a commit 93191bc

File tree

1 file changed

+99
-1
lines changed

1 file changed

+99
-1
lines changed

articles/synapse-analytics/synapse-link/how-to-query-analytical-store-spark-3.md

Lines changed: 99 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -45,6 +45,104 @@ Thus, you can choose between loading to Spark DataFrame and creating a Spark tab
4545
> [!NOTE]
4646
> Please note that all `options` in the commands below are case sensitive.
4747
48+
## Authentication
49+
50+
Now Spark 3.x customers can authenticate to Azure Cosmos DB analytical store using access tokens and database account keys. Access tokes are more secure as they are short lived, meaning less risk sincee it can only be generated by trusted identities, which have been approved by assigning them the required permission using Cosmos DB RBAC.
51+
52+
The connector now supports two auth types, `MasterKey` and `AccessToken`. This can be configured using the property `spark.cosmos.auth.type`.
53+
54+
### Master key authentication
55+
56+
Use the key to read a dataframe using spark:
57+
58+
```scala
59+
val config = Map(
60+
"spark.cosmos.accountEndpoint" -> "<endpoint>",
61+
"spark.cosmos.accountKey" -> "<key>",
62+
"spark.cosmos.database" -> "<db>",
63+
"spark.cosmos.container" -> "<container>"
64+
)
65+
66+
val df = spark.read.format("cosmos.olap").options(config).load()
67+
df.show(10)
68+
```
69+
70+
### Access token authentication
71+
72+
The new keyless authentication introduces support for access tokens:
73+
74+
```scala
75+
val config = Map(
76+
"spark.cosmos.accountEndpoint" -> "<endpoint>",
77+
"spark.cosmos.auth.type" -> "AccessToken",
78+
"spark.cosmos.auth.accessToken" -> "<accessToken>",
79+
"spark.cosmos.database" -> "<db>",
80+
"spark.cosmos.container" -> "<container>"
81+
)
82+
83+
val df = spark.read.format("cosmos.olap").options(config).load()
84+
df.show(10)
85+
```
86+
87+
#### Access token authentication requires role assignment
88+
89+
To use the access token approach, you need to generate access tokens. Since access tokens are associated with azure identities, correct role-based access control (RBAC) must be assigned to the identity. This role assignment is on data plane level, and you must have minimum control plane permissions to perform the role assignment. Click [here](https://learn.microsoft.com/azure/cosmos-db/nosql/security/how-to-grant-data-plane-role-based-access) for more information.
90+
91+
The Access Control (IAM) role assignments from azure portal are on control plane level and don't affect the role assignments on data plane. Data plane role assignments are only available via Azure CLI. The `readAnalytics` action is required to read data from analytical store in Cosmos DB and is not part of any pre-defined roles. As such we must create a custom role definition. In addition to the `readAnalytics` action, also add the actions required for Data Reader. These are the minimum actions required for reading data from analytical store. Create a JSON file with the following content and name it role_definition.json
92+
93+
```JSON
94+
{
95+
"RoleName": "CosmosAnalyticsRole",
96+
"Type": "CustomRole",
97+
"AssignableScopes": ["/"],
98+
"Permissions": [{
99+
"DataActions": [
100+
"Microsoft.DocumentDB/databaseAccounts/readAnalytics",
101+
"Microsoft.DocumentDB/databaseAccounts/readMetadata",
102+
"Microsoft.DocumentDB/databaseAccounts/sqlDatabases/containers/items/read",
103+
"Microsoft.DocumentDB/databaseAccounts/sqlDatabases/containers/executeQuery",
104+
"Microsoft.DocumentDB/databaseAccounts/sqlDatabases/containers/readChangeFeed"
105+
]
106+
}]
107+
}
108+
```
109+
110+
#### Access Token authentication requires Azure CLI
111+
112+
- Log into Azure CLI: `az login`
113+
- Set the default subscription which has your Cosmos DB account: `az account set --subscription <name or id>`
114+
- Create the role definition in the desired Cosmos DB account: `az cosmosdb sql role definition create --account-name <cosmos-account-name> --resource-group <resource-group-name> --body @role_definition.json`
115+
- Copy over the role definition id returned from the above command: `/subscriptions/<subscription-id>/resourceGroups/<resource-group-name>/providers/Microsoft.DocumentDB/databaseAccounts/< cosmos-account-name >/sqlRoleDefinitions/<a-random-generated-guid>`
116+
- Get the principal id of the identity that you want to assign the role to. The identity could be an azure app registration, a virtual machine or any other supported azure resource. Assign the role to the principal using: `az cosmosdb sql role assignment create --account-name "<cosmos-account-name>" --resource-group "<resource-group>" --scope "/" --principal-id "<principal-id-of-identity>" --role-definition-id "<role-definition-id-from-previous-step>"`
117+
118+
> [!Note]
119+
> When using an azure app registration, Use the Object Id as the service principal id in the step above. Also, the principal id and the Cosmos DB account must be in the same tenant.
120+
121+
122+
#### Generating the access token - Synapse Notebooks
123+
124+
The recommended method for Synapse Notebooks is to use service principal with a certificate to generate access tokens. Click [here](https://learn.microsoft.com/azure/synapse-analytics/spark/apache-spark-secure-credentials-with-tokenlibrary) for more information.
125+
126+
```scala
127+
The following code snippet has been validated to work in a Synapse notebook
128+
val tenantId = "<azure-tenant-id>"
129+
val clientId = "<client-id-of-service-principal>"
130+
val kvLinkedService = "<azure-key-vault-linked-service>"
131+
val certName = "<certificate-name>"
132+
val token = mssparkutils.credentials.getSPTokenWithCertLS(
133+
"https://<cosmos-account-name>.documents.azure.com/.default",
134+
"https://login.microsoftonline.com/" + tenantId, clientId, kvLinkedService, certName)
135+
```
136+
137+
Now you can use the access token generated in this step to read data from analytical store when auth type is set to access token.
138+
139+
> [!Note]
140+
> When using an Azure App registration, use the application (Client Id) in the step above.
141+
142+
> [!Note]
143+
> Currently, Synapse doesn’t support generating access tokens using the azure-identity package in notebooks. Furthermore, synapse VHDs don’t include azure-identity package and its dependencies. Click [here](https://learn.microsoft.com/azure/synapse-analytics/synapse-service-identity) for more information.
144+
145+
48146
### Load to Spark DataFrame
49147

50148
In this example, you'll create a Spark DataFrame that points to the Azure Cosmos DB analytical store. You can then perform additional analysis by invoking Spark actions against the DataFrame. This operation doesn't impact the transactional store.
@@ -60,7 +158,7 @@ df = spark.read.format("cosmos.olap")\
60158
```
61159

62160
The equivalent syntax in **Scala** would be the following:
63-
```java
161+
```scala
64162
// To select a preferred list of regions in a multi-region Azure Cosmos DB account, add option("spark.cosmos.preferredRegions", "<Region1>,<Region2>")
65163

66164
val df_olap = spark.read.format("cosmos.olap").

0 commit comments

Comments
 (0)