Skip to content

Commit 53195cc

Browse files
committed
Modified document and image filenames based on AnnaMHuff's comments
1 parent 61a2726 commit 53195cc

File tree

3 files changed

+12
-8
lines changed

3 files changed

+12
-8
lines changed

articles/cosmos-db/mongodb/vcore/connect-from-databricks.md

Lines changed: 12 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -11,14 +11,14 @@ ms.date: 03/08/2024
1111
---
1212

1313
# Connect to Azure Cosmos DB for MongoDB vCore from Azure Databricks
14-
This article explains you for connecting Azure Cosmos DB MongoDB vCore from Azure Databricks. It walks through basic Data Manipulation Language(DML) operations like Read, Filter, SQLs, Aggregation Pipelines and Write Tables using python code.
14+
This article explains how to connect Azure Cosmos DB MongoDB vCore from Azure Databricks. It walks through basic Data Manipulation Language(DML) operations like Read, Filter, SQLs, Aggregation Pipelines and Write Tables using python code.
1515

1616
## Prerequisites
1717
* [Provision an Azure Cosmos DB for MongoDB vCore cluster.](quickstart-portal.md)
1818

1919
* Provision your choice of Spark environment [Azure Databricks](/azure/databricks/scenarios/quickstart-create-databricks-workspace-portal).
2020

21-
## Configure Dependencies for connectivity
21+
## Configure dependencies for connectivity
2222
The following are the dependencies required to connect to Azure Cosmos DB for MongoDB vCore from Azure Databricks:
2323
* **Spark connector for MongoDB**
2424
Spark connector is used to connect to Azure Cosmos DB for MongoDB vCore. Identify and use the version of the connector located in [Maven central](https://mvnrepository.com/artifact/org.mongodb.spark/mongo-spark-connector) that is compatible with the Spark and Scala versions of your Spark environment. We recommend an environment that supports Spark 3.2.1 or higher, and the spark connector available at maven coordinates `org.mongodb.spark:mongo-spark-connector_2.12:3.0.1`.
@@ -51,7 +51,7 @@ Create a Python Notebook in Databricks. Make sure to enter the right values for
5151

5252
### Update Spark configuration with the Azure Cosmos DB for MongoDB connection string
5353

54-
1. Note the connect string under the **Settings** -> **Connection strings** in Azure Cosmos DB MongoDB vCore Resource in Azure Portal. It has the form of "mongodb+srv://\<user>\:\<password>\@\<database_name>.mongocluster.cosmos.azure.com"
54+
1. Note the connect string under the **Settings** -> **Connection strings** in Azure Cosmos DB MongoDB vCore Resource in Azure portal. It has the form of "mongodb+srv://\<user>\:\<password>\@\<database_name>.mongocluster.cosmos.azure.com"
5555
2. Back in Databricks in your cluster configuration, under **Advanced Options** (bottom of page), paste the connection string for both the `spark.mongodb.output.uri` and `spark.mongodb.input.uri` variables. Populate the username and password field appropriate. This way all the workbooks, which running on the cluster uses this configuration.
5656
3. Alternatively you can explicitly set the `option` when calling APIs like: `spark.read.format("mongo").option("spark.mongodb.input.uri", connectionString).load()`. If you configure the variables in the cluster, you don't have to set the option.
5757

@@ -61,7 +61,7 @@ database="<database_name>"
6161
collection="<collection_name>"
6262
```
6363

64-
### Data Sample Set
64+
### Data sample set
6565

6666
For the purpose with this lab, we're using the CSV 'Citibike2019' data set. You can import it:
6767
[CitiBike Trip History 2019](https://citibikenyc.com/system-data).
@@ -93,11 +93,12 @@ display(df_vcore)
9393
```
9494

9595
Output:
96+
9697
**Schema**
9798
:::image type="content" source="./media/connect-from-databricks/print-schema.png" alt-text="Screenshot of the Print Schema.":::
9899

99100
**DataFrame**
100-
:::image type="content" source="./media/connect-from-databricks/display-df-vcore.png" alt-text="Screenshot of the Display DataFrame.":::
101+
:::image type="content" source="./media/connect-from-databricks/display-dataframe-vcore.png" alt-text="Screenshot of the Display DataFrame.":::
101102

102103
### Filter data from Azure Cosmos DB for MongoDB vCore
103104

@@ -149,9 +150,11 @@ df_vcore.write.format("mongo").option("spark.mongodb.output.uri", connectionStri
149150
This command doesn't have an output as it writes directly to the collection. You can cross check if the record is updated using a read command.
150151

151152
### Read data from Azure Cosmos DB for MongoDB vCore collection running an Aggregation Pipeline
153+
[!Note]
154+
[Aggregation Pipeline](../tutorial-aggregation.md) is a powerful capability that allows to preprocess and transform data within Azure Cosmos DB for MongoDB. It's a great match for real-time analytics, dashboards, report generation with roll-ups, sums & averages with 'server-side' data post-processing. (Note: there's a [whole book written about it](https://www.practical-mongodb-aggregations.com/front-cover.html)).
152155

153-
[Aggregation Pipeline](../tutorial-aggregation.md) is a powerful capability that allows to preprocess and transform data within Azure Cosmos DB for MongoDB. It's a great match for real-time analytics, dashboards, report generation with roll-ups, sums & averages with 'server-side' data post-processing. (Note: there's a [whole book written about it](https://www.practical-mongodb-aggregations.com/front-cover.html)). <br/>
154156
Azure Cosmos DB for MongoDB even supports [rich secondary/compound indexes](../indexing.md) to extract, filter, and process only the data it needs.
157+
155158
For example, analyzing all customers located in a specific geography right within the database without first having to load the full data-set, minimizing data-movement and reducing latency. <br/>
156159

157160
Here's an example of using aggregate function:
@@ -163,9 +166,10 @@ display(df_vcore)
163166
```
164167

165168
Output:
166-
:::image type="content" source="./media/connect-from-databricks/display-agg.png" alt-text="Screenshot of the Display Aggregate Data.":::
167169

168-
## Related Content
170+
:::image type="content" source="./media/connect-from-databricks/display-aggregation-pipeline.png" alt-text="Screenshot of the Display Aggregate Data.":::
171+
172+
## Related contents
169173

170174
The following articles demonstrate how to use aggregation pipelines in Azure Cosmos DB for MongoDB vCore:
171175

0 commit comments

Comments
 (0)