Skip to content

Commit 5c23a11

Browse files
authored
Refine wording and syntax in documentation
1 parent 31012a3 commit 5c23a11

File tree

1 file changed

+10
-10
lines changed

1 file changed

+10
-10
lines changed

articles/synapse-analytics/synapse-link/how-to-query-analytical-store-spark-3.md

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ The following capabilities are supported while interacting with Azure Cosmos DB:
2121
* Synapse Apache Spark also allows you to ingest data into Azure Cosmos DB. It is important to note that data is always ingested into Azure Cosmos DB containers through the transactional store. When Synapse Link is enabled, any new inserts, updates, and deletes are then automatically synced to the analytical store.
2222
* Synapse Apache Spark also supports Spark structured streaming with Azure Cosmos DB as a source and a sink.
2323

24-
The following sections walk you through the syntax of above capabilities. You can also checkout the Learn module on how to [Query Azure Cosmos DB with Apache Spark for Azure Synapse Analytics](/training/modules/query-azure-cosmos-db-with-apache-spark-for-azure-synapse-analytics/). Gestures in Azure Synapse Analytics workspace are designed to provide an easy out-of-the-box experience to get started. Gestures are visible when you right-click on an Azure Cosmos DB container in the **Data** tab of the Synapse workspace. With gestures, you can quickly generate code and tailor it to your needs. Gestures are also perfect for discovering data with a single click.
24+
The following sections walk you through the syntax. You can also checkout the Learn module on how to [Query Azure Cosmos DB with Apache Spark for Azure Synapse Analytics](/training/modules/query-azure-cosmos-db-with-apache-spark-for-azure-synapse-analytics/). Gestures in Azure Synapse Analytics workspace are designed to provide an easy out-of-the-box experience to get started. Gestures are visible when you right-click on an Azure Cosmos DB container in the **Data** tab of the Synapse workspace. With gestures, you can quickly generate code and tailor it to your needs. Gestures are also perfect for discovering data with a single click.
2525

2626
> [!IMPORTANT]
2727
> You should be aware of some constraints in the analytical schema that could lead to the unexpected behavior in data loading operations.
@@ -31,7 +31,7 @@ The following sections walk you through the syntax of above capabilities. You ca
3131

3232
Customers can load analytical store data to Spark DataFrames or create Spark tables.
3333

34-
The difference in experience is around whether underlying data changes in the Azure Cosmos DB container should be automatically reflected in the analysis performed in Spark. When Spark DataFrames are registered, or a Spark table is created, metadata around the current snapshot of the analytical store is fetched to Spark for efficient pushdown. It is important to note that since Spark follows a lazy evaluation policy. Unless an action is invoked on the Spark DataFrame, or a SparkSQL query is executed, actual data is not fetched from analytical store.
34+
The difference in experience is around whether underlying data changes in the Azure Cosmos DB container should be automatically reflected in the analysis performed in Spark. When Spark DataFrames are registered, or a Spark table is created, Spark fetches analytical store metadata for efficient pushdown. It is important to note that since Spark follows a lazy evaluation policy. You need to take action to fecth the last snapshot of the data in Spark DataFrames or SparkSQL queries.
3535

3636
In the case of **loading to Spark DataFrame**, the fetched metadata is cached through the lifetime of the Spark session and hence subsequent actions invoked on the DataFrame are evaluated against the snapshot of the analytical store at the time of DataFrame creation.
3737

@@ -112,11 +112,11 @@ The Identity Access Management (IAM) role assignments from azure portal are on c
112112
- Log into Azure CLI: `az login`
113113
- Set the default subscription which has your Cosmos DB account: `az account set --subscription <name or id>`
114114
- Create the role definition in the desired Cosmos DB account: `az cosmosdb sql role definition create --account-name <cosmos-account-name> --resource-group <resource-group-name> --body @role_definition.json`
115-
- Copy over the role definition id returned from the above command: `/subscriptions/<subscription-id>/resourceGroups/<resource-group-name>/providers/Microsoft.DocumentDB/databaseAccounts/< cosmos-account-name >/sqlRoleDefinitions/<a-random-generated-guid>`
115+
- Copy over the role `definition id` returned: `/subscriptions/<subscription-id>/resourceGroups/<resource-group-name>/providers/Microsoft.DocumentDB/databaseAccounts/< cosmos-account-name >/sqlRoleDefinitions/<a-random-generated-guid>`
116116
- Get the principal id of the identity that you want to assign the role to. The identity could be an azure app registration, a virtual machine, or any other supported azure resource. Assign the role to the principal using: `az cosmosdb sql role assignment create --account-name "<cosmos-account-name>" --resource-group "<resource-group>" --scope "/" --principal-id "<principal-id-of-identity>" --role-definition-id "<role-definition-id-from-previous-step>"`
117117

118118
> [!Note]
119-
> When using an azure app registration, Use the Object Id as the service principal id in the step above. Also, the principal id and the Cosmos DB account must be in the same tenant.
119+
> When using an azure app registration, Use the `Object Id` as the service principal id. Also, the principal id and the Cosmos DB account must be in the same tenant.
120120
121121

122122
#### Generating the access token - Synapse Notebooks
@@ -137,7 +137,7 @@ val token = mssparkutils.credentials.getSPTokenWithCertLS(
137137
Now you can use the access token generated in this step to read data from analytical store when auth type is set to access token.
138138

139139
> [!Note]
140-
> When using an Azure App registration, use the application (Client Id) in the step above.
140+
> When using an Azure App registration, use the application (Client Id).
141141
142142
> [!Note]
143143
> Currently, Synapse doesn’t support generating access tokens using the azure-identity package in notebooks. Furthermore, synapse VHDs don’t include azure-identity package and its dependencies. Click [here](https://learn.microsoft.com/azure/synapse-analytics/synapse-service-identity) for more information.
@@ -169,7 +169,7 @@ val df_olap = spark.read.format("cosmos.olap").
169169

170170
### Create Spark table
171171

172-
In this example, you create a Spark table that points the Azure Cosmos DB analytical store. You can then perform additional analysis by invoking SparkSQL queries against the table. This operation neither impacts the transactional store nor does it incur any data movement. If you decide to delete this Spark table, the underlying Azure Cosmos DB container and the corresponding analytical store will not be affected.
172+
In this example, you create a Spark table that points the Azure Cosmos DB analytical store. You can then perform additional analysis by invoking SparkSQL queries against the table. This operation doesn't impact transactional store or incur data movement. If you decide to delete this Spark table, the underlying Azure Cosmos DB container and the corresponding analytical store will not be affected.
173173

174174
This scenario is convenient to reuse Spark tables through third-party tools and provide accessibility to the underlying data for the run-time.
175175

@@ -190,7 +190,7 @@ create table call_center using cosmos.olap options (
190190

191191
## Write Spark DataFrame to Azure Cosmos DB container
192192

193-
In this example, you write a Spark DataFrame into an Azure Cosmos DB container. This operation will impact the performance of transactional workloads and consume request units provisioned on the Azure Cosmos DB container or the shared database.
193+
In this example, you write a Spark DataFrame into an Azure Cosmos DB container. This operation impacts the performance of transactional workloads and consume request units provisioned on the Azure Cosmos DB container or the shared database.
194194

195195
The syntax in **Python** would be the following:
196196
```python
@@ -218,12 +218,12 @@ df.write.format("cosmos.oltp").
218218
```
219219

220220
## Load streaming DataFrame from container
221-
In this gesture, you use Spark Streaming capability to load data from a container into a dataframe. The data will be stored in the primary data lake account (and file system) you connected to the workspace.
221+
In this gesture, you use Spark Streaming capability to load data from a container into a dataframe. The data is stored in the primary data lake account (and file system) you connected to the workspace.
222222
> [!NOTE]
223-
> If you are looking to reference external libraries in Synapse Apache Spark, learn more [here](../spark/apache-spark-azure-portal-add-libraries.md). For instance, if you are looking to ingest a Spark DataFrame to a container of Azure Cosmos DB for MongoDB, you can leverage the MongoDB connector for Spark [here](https://docs.mongodb.com/spark-connector/master/).
223+
> If you are looking to reference external libraries in Synapse Apache Spark, learn more [here](../spark/apache-spark-azure-portal-add-libraries.md). For instance, if you are looking to ingest a Spark DataFrame to a container of Azure Cosmos DB for MongoDB, you can use the MongoDB connector for Spark [here](https://docs.mongodb.com/spark-connector/master/).
224224
225225
## Load streaming DataFrame from Azure Cosmos DB container
226-
In this example, you use Spark's structured streaming capability to load data from an Azure Cosmos DB container into a Spark streaming DataFrame using the change feed functionality in Azure Cosmos DB. The checkpoint data used by Spark will be stored in the primary data lake account (and file system) that you connected to the workspace.
226+
In this example, you use Spark's structured streaming to load data from an Azure Cosmos DB container into a Spark streaming DataFrame, using the change feed functionality in Azure Cosmos DB. The checkpoint data used by Spark will be stored in the primary data lake account (and file system) that you connected to the workspace.
227227

228228
The syntax in **Python** would be the following:
229229
```python

0 commit comments

Comments
 (0)