Skip to content

Commit 01510ce

Browse files
authored
Fix typos and improve clarity in documentation
1 parent f23026d commit 01510ce

File tree

1 file changed

+17
-17
lines changed

1 file changed

+17
-17
lines changed

articles/synapse-analytics/synapse-link/how-to-query-analytical-store-spark-3.md

Lines changed: 17 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -12,24 +12,24 @@ ms.custom: cosmos-db, mode-other
1212

1313
# Interact with Azure Cosmos DB using Apache Spark 3 in Azure Synapse Link
1414

15-
In this article, you'll learn how to interact with Azure Cosmos DB using Synapse Apache Spark 3. With its full support for Scala, Python, SparkSQL, and C#, Synapse Apache Spark 3 is central to analytics, data engineering, data science, and data exploration scenarios in [Azure Synapse Link for Azure Cosmos DB](/azure/cosmos-db/synapse-link).
15+
In this article, you learn how to interact with Azure Cosmos DB using Synapse Apache Spark 3. Customers can use Scala, Python, SparkSQL, and C#, for analytics, data engineering, data science, and data exploration scenarios in [Azure Synapse Link for Azure Cosmos DB](/azure/cosmos-db/synapse-link).
1616

1717
The following capabilities are supported while interacting with Azure Cosmos DB:
1818
* Synapse Apache Spark 3 allows you to analyze data in your Azure Cosmos DB containers that are enabled with Azure Synapse Link in near real-time without impacting the performance of your transactional workloads. The following two options are available to query the Azure Cosmos DB [analytical store](/azure/cosmos-db/analytical-store-introduction) from Spark:
1919
+ Load to Spark DataFrame
2020
+ Create Spark table
2121
* Synapse Apache Spark also allows you to ingest data into Azure Cosmos DB. It is important to note that data is always ingested into Azure Cosmos DB containers through the transactional store. When Synapse Link is enabled, any new inserts, updates, and deletes are then automatically synced to the analytical store.
22-
* Synapse Apache Spark also supports Spark structured streaming with Azure Cosmos DB as a source as well as a sink.
22+
* Synapse Apache Spark also supports Spark structured streaming with Azure Cosmos DB as a source and a sink.
2323

2424
The following sections walk you through the syntax of above capabilities. You can also checkout the Learn module on how to [Query Azure Cosmos DB with Apache Spark for Azure Synapse Analytics](/training/modules/query-azure-cosmos-db-with-apache-spark-for-azure-synapse-analytics/). Gestures in Azure Synapse Analytics workspace are designed to provide an easy out-of-the-box experience to get started. Gestures are visible when you right-click on an Azure Cosmos DB container in the **Data** tab of the Synapse workspace. With gestures, you can quickly generate code and tailor it to your needs. Gestures are also perfect for discovering data with a single click.
2525

2626
> [!IMPORTANT]
2727
> You should be aware of some constraints in the analytical schema that could lead to the unexpected behavior in data loading operations.
28-
> As an example, only first 1000 properties from transactional schema are available in the analytical schema, properties with spaces are not available, etc. If you are experiencing some unexpected results, check the [analytical store schema constraints](/azure/cosmos-db/analytical-store-introduction#schema-constraints) for more details.
28+
> As an example, only first 1,000 properties from transactional schema are available in the analytical schema, properties with spaces are not available, etc. If you are experiencing some unexpected results, check the [analytical store schema constraints](/azure/cosmos-db/analytical-store-introduction#schema-constraints) for more details.
2929
3030
## Query Azure Cosmos DB analytical store
3131

32-
Before you learn about the two possible options to query Azure Cosmos DB analytical store, loading to Spark DataFrame and creating Spark table, it is worth exploring the differences in experience so you can choose the option that works for your needs.
32+
Customers can load analytical store data to Spark DataFrames or create Spark tables.
3333

3434
The difference in experience is around whether underlying data changes in the Azure Cosmos DB container should be automatically reflected in the analysis performed in Spark. When either a Spark DataFrame is registered or a Spark table is created against a container's analytical store, metadata around the current snapshot of data in the analytical store is fetched to Spark for efficient pushdown of subsequent analysis. It is important to note that since Spark follows a lazy evaluation policy, unless an action is invoked on the Spark DataFrame or a SparkSQL query is executed against the Spark table, actual data is not fetched from the underlying container's analytical store.
3535

@@ -43,13 +43,13 @@ Thus, you can choose between loading to Spark DataFrame and creating a Spark tab
4343
> To query Azure Cosmos DB for MongoDB accounts, learn more about the [full fidelity schema representation](/azure/cosmos-db/analytical-store-introduction#analytical-schema) in the analytical store and the extended property names to be used.
4444
4545
> [!NOTE]
46-
> Please note that all `options` in the commands below are case sensitive.
46+
> All `options` in the commands below are case sensitive.
4747
4848
## Authentication
4949

50-
Now Spark 3.x customers can authenticate to Azure Cosmos DB analytical store using access tokens and database account keys. Access tokes are more secure as they are short lived, meaning less risk sincee it can only be generated by trusted identities, which have been approved by assigning them the required permission using Cosmos DB RBAC.
50+
Now Spark 3.x customers can authenticate to Azure Cosmos DB analytical store using access tokens and database account keys, that are more secure as they are short lived, meaning less risk sincee it can only be generated by trusted identities, which have been approved by assigning them the required permission using Cosmos DB RBAC.
5151

52-
The connector now supports two auth types, `MasterKey` and `AccessToken`. This can be configured using the property `spark.cosmos.auth.type`.
52+
The connector now supports two auth types, `MasterKey` and `AccessToken` for the `spark.cosmos.auth.type` property.
5353

5454
### Master key authentication
5555

@@ -86,9 +86,9 @@ df.show(10)
8686

8787
#### Access token authentication requires role assignment
8888

89-
To use the access token approach, you need to generate access tokens. Since access tokens are associated with azure identities, correct role-based access control (RBAC) must be assigned to the identity. This role assignment is on data plane level, and you must have minimum control plane permissions to perform the role assignment. Click [here](https://learn.microsoft.com/azure/cosmos-db/nosql/security/how-to-grant-data-plane-role-based-access) for more information.
89+
To use the access token approach, you need to generate access tokens. Since access tokens are associated with azure identities, correct role-based access control (RBAC) must be assigned to the identity. The role assignment is on data plane level, and you must have minimum control plane permissions to perform the role assignment. Click [here](https://learn.microsoft.com/azure/cosmos-db/nosql/security/how-to-grant-data-plane-role-based-access) for more information.
9090

91-
The Access Control (IAM) role assignments from azure portal are on control plane level and don't affect the role assignments on data plane. Data plane role assignments are only available via Azure CLI. The `readAnalytics` action is required to read data from analytical store in Cosmos DB and is not part of any pre-defined roles. As such we must create a custom role definition. In addition to the `readAnalytics` action, also add the actions required for Data Reader. These are the minimum actions required for reading data from analytical store. Create a JSON file with the following content and name it role_definition.json
91+
The Identity Access Management (IAM) role assignments from azure portal are on control plane level and don't affect the role assignments on data plane. Data plane role assignments are only available via Azure CLI. The `readAnalytics` action is required to read data from analytical store in Cosmos DB and is not part of any predefined roles. As such we must create a custom role definition. In addition to the `readAnalytics` action, also add the actions required for Data Reader. Create a JSON file with the following content and name it role_definition.json
9292

9393
```JSON
9494
{
@@ -113,7 +113,7 @@ The Access Control (IAM) role assignments from azure portal are on control plane
113113
- Set the default subscription which has your Cosmos DB account: `az account set --subscription <name or id>`
114114
- Create the role definition in the desired Cosmos DB account: `az cosmosdb sql role definition create --account-name <cosmos-account-name> --resource-group <resource-group-name> --body @role_definition.json`
115115
- Copy over the role definition id returned from the above command: `/subscriptions/<subscription-id>/resourceGroups/<resource-group-name>/providers/Microsoft.DocumentDB/databaseAccounts/< cosmos-account-name >/sqlRoleDefinitions/<a-random-generated-guid>`
116-
- Get the principal id of the identity that you want to assign the role to. The identity could be an azure app registration, a virtual machine or any other supported azure resource. Assign the role to the principal using: `az cosmosdb sql role assignment create --account-name "<cosmos-account-name>" --resource-group "<resource-group>" --scope "/" --principal-id "<principal-id-of-identity>" --role-definition-id "<role-definition-id-from-previous-step>"`
116+
- Get the principal id of the identity that you want to assign the role to. The identity could be an azure app registration, a virtual machine, or any other supported azure resource. Assign the role to the principal using: `az cosmosdb sql role assignment create --account-name "<cosmos-account-name>" --resource-group "<resource-group>" --scope "/" --principal-id "<principal-id-of-identity>" --role-definition-id "<role-definition-id-from-previous-step>"`
117117

118118
> [!Note]
119119
> When using an azure app registration, Use the Object Id as the service principal id in the step above. Also, the principal id and the Cosmos DB account must be in the same tenant.
@@ -145,7 +145,7 @@ Now you can use the access token generated in this step to read data from analyt
145145

146146
### Load to Spark DataFrame
147147

148-
In this example, you'll create a Spark DataFrame that points to the Azure Cosmos DB analytical store. You can then perform additional analysis by invoking Spark actions against the DataFrame. This operation doesn't impact the transactional store.
148+
In this example, you create a Spark DataFrame that points to the Azure Cosmos DB analytical store. You can then perform more analysis by invoking Spark actions against the DataFrame. This operation doesn't impact the transactional store.
149149

150150
The syntax in **Python** would be the following:
151151
```python
@@ -169,7 +169,7 @@ val df_olap = spark.read.format("cosmos.olap").
169169

170170
### Create Spark table
171171

172-
In this example, you'll create a Spark table that points the Azure Cosmos DB analytical store. You can then perform additional analysis by invoking SparkSQL queries against the table. This operation neither impacts the transactional store nor does it incur any data movement. If you decide to delete this Spark table, the underlying Azure Cosmos DB container and the corresponding analytical store will not be affected.
172+
In this example, you create a Spark table that points the Azure Cosmos DB analytical store. You can then perform additional analysis by invoking SparkSQL queries against the table. This operation neither impacts the transactional store nor does it incur any data movement. If you decide to delete this Spark table, the underlying Azure Cosmos DB container and the corresponding analytical store will not be affected.
173173

174174
This scenario is convenient to reuse Spark tables through third-party tools and provide accessibility to the underlying data for the run-time.
175175

@@ -190,7 +190,7 @@ create table call_center using cosmos.olap options (
190190

191191
## Write Spark DataFrame to Azure Cosmos DB container
192192

193-
In this example, you'll write a Spark DataFrame into an Azure Cosmos DB container. This operation will impact the performance of transactional workloads and consume request units provisioned on the Azure Cosmos DB container or the shared database.
193+
In this example, you write a Spark DataFrame into an Azure Cosmos DB container. This operation will impact the performance of transactional workloads and consume request units provisioned on the Azure Cosmos DB container or the shared database.
194194

195195
The syntax in **Python** would be the following:
196196
```python
@@ -218,12 +218,12 @@ df.write.format("cosmos.oltp").
218218
```
219219

220220
## Load streaming DataFrame from container
221-
In this gesture, you'll use Spark Streaming capability to load data from a container into a dataframe. The data will be stored in the primary data lake account (and file system) you connected to the workspace.
221+
In this gesture, you use Spark Streaming capability to load data from a container into a dataframe. The data will be stored in the primary data lake account (and file system) you connected to the workspace.
222222
> [!NOTE]
223223
> If you are looking to reference external libraries in Synapse Apache Spark, learn more [here](../spark/apache-spark-azure-portal-add-libraries.md). For instance, if you are looking to ingest a Spark DataFrame to a container of Azure Cosmos DB for MongoDB, you can leverage the MongoDB connector for Spark [here](https://docs.mongodb.com/spark-connector/master/).
224224
225225
## Load streaming DataFrame from Azure Cosmos DB container
226-
In this example, you'll use Spark's structured streaming capability to load data from an Azure Cosmos DB container into a Spark streaming DataFrame using the change feed functionality in Azure Cosmos DB. The checkpoint data used by Spark will be stored in the primary data lake account (and file system) that you connected to the workspace.
226+
In this example, you use Spark's structured streaming capability to load data from an Azure Cosmos DB container into a Spark streaming DataFrame using the change feed functionality in Azure Cosmos DB. The checkpoint data used by Spark will be stored in the primary data lake account (and file system) that you connected to the workspace.
227227

228228
The syntax in **Python** would be the following:
229229
```python
@@ -252,7 +252,7 @@ val dfStream = spark.readStream.
252252
```
253253

254254
## Write streaming DataFrame to Azure Cosmos DB container
255-
In this example, you'll write a streaming DataFrame into an Azure Cosmos DB container. This operation will impact the performance of transactional workloads and consume Request Units provisioned on the Azure Cosmos DB container or shared database. If the folder */localWriteCheckpointFolder* isn't created (in the example below), it will be automatically created.
255+
In this example, you write a streaming DataFrame into an Azure Cosmos DB container. This operation impacts the performance of transactional workloads and consume Request Units provisioned on the Azure Cosmos DB container or shared database. If the folder */localWriteCheckpointFolder* isn't created (in the example below), it is automatically created.
256256

257257
The syntax in **Python** would be the following:
258258

@@ -293,4 +293,4 @@ query.awaitTermination()
293293
* [Samples to get started with Azure Synapse Link on GitHub](https://aka.ms/cosmosdb-synapselink-samples)
294294
* [Learn what is supported in Azure Synapse Link for Azure Cosmos DB](./concept-synapse-link-cosmos-db-support.md)
295295
* [Connect to Synapse Link for Azure Cosmos DB](../quickstart-connect-synapse-link-cosmos-db.md)
296-
* Checkout the Learn module on how to [Query Azure Cosmos DB with Apache Spark for Azure Synapse Analytics](/training/modules/query-azure-cosmos-db-with-apache-spark-for-azure-synapse-analytics/).
296+
* Check out the Learn module on how to [Query Azure Cosmos DB with Apache Spark for Azure Synapse Analytics](/training/modules/query-azure-cosmos-db-with-apache-spark-for-azure-synapse-analytics/).

0 commit comments

Comments
 (0)