You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/cosmos-db/mongodb/vcore/connect-from-databricks.md
+12-8Lines changed: 12 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -11,14 +11,14 @@ ms.date: 03/08/2024
11
11
---
12
12
13
13
# Connect to Azure Cosmos DB for MongoDB vCore from Azure Databricks
14
-
This article explains you for connecting Azure Cosmos DB MongoDB vCore from Azure Databricks. It walks through basic Data Manipulation Language(DML) operations like Read, Filter, SQLs, Aggregation Pipelines and Write Tables using python code.
14
+
This article explains how to connect Azure Cosmos DB MongoDB vCore from Azure Databricks. It walks through basic Data Manipulation Language(DML) operations like Read, Filter, SQLs, Aggregation Pipelines and Write Tables using python code.
15
15
16
16
## Prerequisites
17
17
*[Provision an Azure Cosmos DB for MongoDB vCore cluster.](quickstart-portal.md)
18
18
19
19
* Provision your choice of Spark environment [Azure Databricks](/azure/databricks/scenarios/quickstart-create-databricks-workspace-portal).
20
20
21
-
## Configure Dependencies for connectivity
21
+
## Configure dependencies for connectivity
22
22
The following are the dependencies required to connect to Azure Cosmos DB for MongoDB vCore from Azure Databricks:
23
23
***Spark connector for MongoDB**
24
24
Spark connector is used to connect to Azure Cosmos DB for MongoDB vCore. Identify and use the version of the connector located in [Maven central](https://mvnrepository.com/artifact/org.mongodb.spark/mongo-spark-connector) that is compatible with the Spark and Scala versions of your Spark environment. We recommend an environment that supports Spark 3.2.1 or higher, and the spark connector available at maven coordinates `org.mongodb.spark:mongo-spark-connector_2.12:3.0.1`.
@@ -51,7 +51,7 @@ Create a Python Notebook in Databricks. Make sure to enter the right values for
51
51
52
52
### Update Spark configuration with the Azure Cosmos DB for MongoDB connection string
53
53
54
-
1. Note the connect string under the **Settings** -> **Connection strings** in Azure Cosmos DB MongoDB vCore Resource in Azure Portal. It has the form of "mongodb+srv://\<user>\:\<password>\@\<database_name>.mongocluster.cosmos.azure.com"
54
+
1. Note the connect string under the **Settings** -> **Connection strings** in Azure Cosmos DB MongoDB vCore Resource in Azure portal. It has the form of "mongodb+srv://\<user>\:\<password>\@\<database_name>.mongocluster.cosmos.azure.com"
55
55
2. Back in Databricks in your cluster configuration, under **Advanced Options** (bottom of page), paste the connection string for both the `spark.mongodb.output.uri` and `spark.mongodb.input.uri` variables. Populate the username and password field appropriate. This way all the workbooks, which running on the cluster uses this configuration.
56
56
3. Alternatively you can explicitly set the `option` when calling APIs like: `spark.read.format("mongo").option("spark.mongodb.input.uri", connectionString).load()`. If you configure the variables in the cluster, you don't have to set the option.
57
57
@@ -61,7 +61,7 @@ database="<database_name>"
61
61
collection="<collection_name>"
62
62
```
63
63
64
-
### Data Sample Set
64
+
### Data sample set
65
65
66
66
For the purpose with this lab, we're using the CSV 'Citibike2019' data set. You can import it:
67
67
[CitiBike Trip History 2019](https://citibikenyc.com/system-data).
@@ -93,11 +93,12 @@ display(df_vcore)
93
93
```
94
94
95
95
Output:
96
+
96
97
**Schema**
97
98
:::image type="content" source="./media/connect-from-databricks/print-schema.png" alt-text="Screenshot of the Print Schema.":::
98
99
99
100
**DataFrame**
100
-
:::image type="content" source="./media/connect-from-databricks/display-df-vcore.png" alt-text="Screenshot of the Display DataFrame.":::
101
+
:::image type="content" source="./media/connect-from-databricks/display-dataframe-vcore.png" alt-text="Screenshot of the Display DataFrame.":::
101
102
102
103
### Filter data from Azure Cosmos DB for MongoDB vCore
This command doesn't have an output as it writes directly to the collection. You can cross check if the record is updated using a read command.
150
151
151
152
### Read data from Azure Cosmos DB for MongoDB vCore collection running an Aggregation Pipeline
153
+
[!Note]
154
+
[Aggregation Pipeline](../tutorial-aggregation.md) is a powerful capability that allows to preprocess and transform data within Azure Cosmos DB for MongoDB. It's a great match for real-time analytics, dashboards, report generation with roll-ups, sums & averages with 'server-side' data post-processing. (Note: there's a [whole book written about it](https://www.practical-mongodb-aggregations.com/front-cover.html)).
152
155
153
-
[Aggregation Pipeline](../tutorial-aggregation.md) is a powerful capability that allows to preprocess and transform data within Azure Cosmos DB for MongoDB. It's a great match for real-time analytics, dashboards, report generation with roll-ups, sums & averages with 'server-side' data post-processing. (Note: there's a [whole book written about it](https://www.practical-mongodb-aggregations.com/front-cover.html)). <br/>
154
156
Azure Cosmos DB for MongoDB even supports [rich secondary/compound indexes](../indexing.md) to extract, filter, and process only the data it needs.
157
+
155
158
For example, analyzing all customers located in a specific geography right within the database without first having to load the full data-set, minimizing data-movement and reducing latency. <br/>
156
159
157
160
Here's an example of using aggregate function:
@@ -163,9 +166,10 @@ display(df_vcore)
163
166
```
164
167
165
168
Output:
166
-
:::image type="content" source="./media/connect-from-databricks/display-agg.png" alt-text="Screenshot of the Display Aggregate Data.":::
167
169
168
-
## Related Content
170
+
:::image type="content" source="./media/connect-from-databricks/display-aggregation-pipeline.png" alt-text="Screenshot of the Display Aggregate Data.":::
171
+
172
+
## Related contents
169
173
170
174
The following articles demonstrate how to use aggregation pipelines in Azure Cosmos DB for MongoDB vCore:
0 commit comments