You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/cosmos-db/mongodb/vcore/connect-from-databricks.md
+24-24Lines changed: 24 additions & 24 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -13,7 +13,7 @@ ms.date: 03/08/2024
13
13
# Connect to Azure Cosmos DB for MongoDB vCore from Azure Databricks
14
14
[!INCLUDE[MongoDB vCore](./introduction.md)]
15
15
16
-
This article walks you through connecting Azure Cosmos DB for MongoDB vCore using Spark connector for Databricks. It walks through basic basic Data Manipulation Language(DML) operations like Read, Write, Create Views or Temporary Tables, Filtering and Running Aggregations using python code.
16
+
This article walks you through connecting Azure Cosmos DB for MongoDB vCore using Spark connector for Databricks. It walks through basic Data Manipulation Language(DML) operations like Read, Write, Create Views or Temporary Tables, Filtering and Running Aggregations using python code.
17
17
18
18
## Prerequisites
19
19
*[Provision an Azure Cosmos DB for MongoDB vCore cluster.](quickstart-portal.md)
@@ -22,7 +22,7 @@ This article walks you through connecting Azure Cosmos DB for MongoDB vCore usin
22
22
23
23
## Dependencies for connectivity
24
24
***Spark connector for MongoDV vCore:**
25
-
Spark connector is used to connect to Azure Cosmos DB for MongoDB Atlas. Identify and use the version of the connector located in [Maven central](hhttps://mvnrepository.com/artifact/org.mongodb.spark/mongo-spark-connector) that is compatible with the Spark and Scala versions of your Spark environment. We recommend an environment that supports Spark 3.2.1 or higher, and the spark connector available at maven coordinates `org.mongodb.spark:mongo-spark-connector_2.12:3.0.1`.
25
+
Spark connector is used to connect to Azure Cosmos DB for MongoDB Atlas. Identify and use the version of the connector located in [Maven central](hhttps://mvnrepository.com/artifact/org.mongodb.spark/mongo-spark-connector) that is compatible with the Spark and Scala versions of your Spark environment. We recommend an environment that supports Spark 3.2.1 or higher, and the spark connector available at maven coordinates `org.mongodb.spark:mongo-spark-connector_2.12:3.0.1`.
26
26
27
27
***Azure Cosmos DB for MongoDB connection strings:** Your Azure Cosmos DB for MongoDB vCore connection string, user name, and passwords.
28
28
@@ -48,13 +48,13 @@ After that, you may create a Scala or Python notebook for migration.
48
48
49
49
## Create Python notebook to connect to Azure Cosmos DB for MongoDB vCore
50
50
51
-
Create a Python Notebook in Databricks. Make sure to enter the right values for the variables before running the following codes
51
+
Create a Python Notebook in Databricks. Make sure to enter the right values for the variables before running the following codes.
52
52
53
53
### Update Spark configuration with the Azure Cosmos DB for MongoDB connection string
54
54
55
55
1. Note the connect string under the **Settings** -> **Connection strings** in Azure Cosmos DB MongoDB vCore Resource in Azure Portal. It has the form of "mongodb+srv://\<user>\:\<password>\@\<database_name>.mongocluster.cosmos.azure.com"
56
-
2. Back in Databricks in your cluster configuration, under **Advanced Options** (bottom of page), paste the connection string for both the `spark.mongodb.output.uri` and `spark.mongodb.input.uri` variables. Plase populate the username and password field appropriatly. This way all the workbooks you are running on the cluster will use this configuration.
57
-
3.Alternativley you can explictly set the `option` when calling APIs like: `spark.read.format("mongo").option("spark.mongodb.input.uri", connectionString).load()`. If congigured the variables in the cluster, you don't have to set the option.
56
+
2. Back in Databricks in your cluster configuration, under **Advanced Options** (bottom of page), paste the connection string for both the `spark.mongodb.output.uri` and `spark.mongodb.input.uri` variables. Populate the username and password field appropriate. This way all the workbooks, which running on the cluster uses this configuration.
57
+
3.Alternatively you can explicitly set the `option` when calling APIs like: `spark.read.format("mongo").option("spark.mongodb.input.uri", connectionString).load()`. If you configure the variables in the cluster, you don't have to set the option.
This command does not have an output as it will write directly to the collection. You can cross check if the record is updated using a read command.
150
+
This command doesn't have an output as it writes directly to the collection. You can cross check if the record is updated using a read command.
151
151
152
152
### Read data from Azure Cosmos DB for MongoDB vCore collection running an Aggregation Pipeline
153
153
154
-
[Aggregation Pipeline](../tutorial-aggregation.md) is a powerful capability that allows to pre-process and transform data within Azure CosmosDB for MongoDB. It's a great match for real-time analytics, dashboards, report generation with roll-ups, sums & averages with 'server-side' data post-processing. (Note: there is a [whole book written about it](https://www.practical-mongodb-aggregations.com/front-cover.html)). <br/>
155
-
Azure Cosmos DB for MongoDB even supports [rich secondary/compound indexes](../indexing.md) to extract, filter, and process only the data it needs – for example, analyzing all customers located in a specific geography right within the database without first having to load the full data-set, minimizing data-movement and reducing latency. <br/>
156
-
You can find the syntax in the hyperlinks above.
154
+
[Aggregation Pipeline](../tutorial-aggregation.md) is a powerful capability that allows to preprocess and transform data within Azure Cosmos DB for MongoDB. It's a great match for real-time analytics, dashboards, report generation with roll-ups, sums & averages with 'server-side' data post-processing. (Note: there's a [whole book written about it](https://www.practical-mongodb-aggregations.com/front-cover.html)). <br/>
155
+
Azure Cosmos DB for MongoDB even supports [rich secondary/compound indexes](../indexing.md) to extract, filter, and process only the data it needs.
156
+
For example, analyzing all customers located in a specific geography right within the database without first having to load the full data-set, minimizing data-movement and reducing latency. <br/>
0 commit comments