Skip to content

Commit 3badce4

Browse files
committed
Edit NoSQL migrate relational data
1 parent 8d70d48 commit 3badce4

File tree

1 file changed

+34
-36
lines changed

1 file changed

+34
-36
lines changed

articles/cosmos-db/nosql/migrate-relational-data.md

Lines changed: 34 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -1,31 +1,32 @@
11
---
2-
title: Migrate one-to-few relational data into Azure Cosmos DB for NoSQL
3-
description: Learn how to handle complex data migration for one-to-few relationships into API for NoSQL
2+
title: Migrate one-to-few relational data
3+
titleSuffix: Azure Cosmos DB for NoSQL
4+
description: Learn how to perform a complex data migration for one-to-few relationships from a relational database into Azure Cosmos DB for NoSQL.
5+
ms.author: thvankra
46
author: TheovanKraay
57
ms.service: cosmos-db
68
ms.subservice: nosql
7-
ms.topic: how-to
8-
ms.date: 12/12/2019
9-
ms.author: thvankra
9+
ms.topic: conceptual
1010
ms.devlang: python, scala
11+
ms.date: 02/27/2023
1112
ms.custom: ignite-2022
1213
---
1314

14-
# Migrate one-to-few relational data into Azure Cosmos DB for NoSQL account
15+
# Migrate one-to-few relational data into an Azure Cosmos DB for NoSQL account
16+
1517
[!INCLUDE[NoSQL](../includes/appliesto-nosql.md)]
1618

1719
In order to migrate from a relational database to Azure Cosmos DB for NoSQL, it can be necessary to make changes to the data model for optimization.
1820

19-
One common transformation is denormalizing data by embedding related subitems within one JSON document. Here we look at a few options for this using Azure Data Factory or Azure Databricks. For general guidance on data modeling for Azure Cosmos DB, please review [Data modeling in Azure Cosmos DB](modeling-data.md).
21+
One common transformation is denormalizing data by embedding related subitems within one JSON document. Here we look at a few options for this using Azure Data Factory or Azure Databricks. For more information on data modeling for Azure Cosmos DB, see [data modeling in Azure Cosmos DB](modeling-data.md).
2022

2123
## Example Scenario
2224

2325
Assume we have the following two tables in our SQL database, Orders and OrderDetails.
2426

25-
2627
:::image type="content" source="./media/migrate-relational-data/orders.png" alt-text="Screenshot that shows the Orders and OrderDetails tables in the SQL database." border="false" :::
2728

28-
We want to combine this one-to-few relationship into one JSON document during migration. To do this, we can create a T-SQL query using "FOR JSON" as below:
29+
We want to combine this one-to-few relationship into one JSON document during migration. To create a single document, create a T-SQL query using `FOR JSON`:
2930

3031
```sql
3132
SELECT
@@ -44,26 +45,24 @@ SELECT
4445
FROM Orders o;
4546
```
4647

47-
The results of this query would look as below:
48+
The results of this query would include data from the **Orders** table:
4849

49-
:::image type="content" source="./media/migrate-relational-data/for-json-query-result.png" alt-text="Order Details" lightbox="./media/migrate-relational-data/for-json-query-result.png":::
50+
:::image type="content" source="./media/migrate-relational-data/for-json-query-result.png" alt-text="Screenshot of a query that results in details of various orders." lightbox="./media/migrate-relational-data/for-json-query-result.png":::
5051

51-
Ideally, you want to use a single Azure Data Factory (ADF) copy activity to query SQL data as the source and write the output directly to Azure Cosmos DB sink as proper JSON objects. Currently, it is not possible to perform the needed JSON transformation in one copy activity. If we try to copy the results of the above query into an Azure Cosmos DB for NoSQL container, we will see the OrderDetails field as a string property of our document, instead of the expected JSON array.
52+
Ideally, you want to use a single Azure Data Factory (ADF) copy activity to query SQL data as the source and write the output directly to Azure Cosmos DB sink as proper JSON objects. Currently, it isn't possible to perform the needed JSON transformation in one copy activity. If we try to copy the results of the above query into an Azure Cosmos DB for NoSQL container, we see the OrderDetails field as a string property of our document, instead of the expected JSON array.
5253

5354
We can work around this current limitation in one of the following ways:
5455

55-
* **Use Azure Data Factory with two copy activities**:
56-
1. Get JSON-formatted data from SQL to a text file in an intermediary blob storage location, and
56+
* **Use Azure Data Factory with two copy activities**:
57+
1. Get JSON-formatted data from SQL to a text file in an intermediary blob storage location
5758
2. Load data from the JSON text file to a container in Azure Cosmos DB.
58-
59-
* **Use Azure Databricks to read from SQL and write to Azure Cosmos DB** - we will present two options here.
60-
59+
* **Use Azure Databricks to read from SQL and write to Azure Cosmos DB** - we present two options here.
6160

6261
Let’s look at these approaches in more detail:
6362

6463
## Azure Data Factory
6564

66-
Although we cannot embed OrderDetails as a JSON-array in the destination Azure Cosmos DB document, we can work around the issue by using two separate Copy Activities.
65+
Although we can't embed OrderDetails as a JSON-array in the destination Azure Cosmos DB document, we can work around the issue by using two separate Copy Activities.
6766

6867
### Copy Activity #1: SqlJsonToBlobText
6968

@@ -88,48 +87,46 @@ SELECT [value] FROM OPENJSON(
8887
)
8988
```
9089

91-
:::image type="content" source="./media/migrate-relational-data/adf1.png" alt-text="ADF copy":::
90+
:::image type="content" source="./media/migrate-relational-data/adf1.png" alt-text="Screenshot of the preview values in the ADF copy operation.":::
9291

93-
94-
For the sink of the SqlJsonToBlobText copy activity, we choose "Delimited Text" and point it to a specific folder in Azure Blob Storage with a dynamically generated unique file name (for example, '@concat(pipeline().RunId,'.json').
95-
Since our text file is not really "delimited" and we do not want it to be parsed into separate columns using commas and want to preserve the double-quotes ("), we set "Column delimiter" to a Tab ("\t") - or another character not occurring in the data - and "Quote character" to "No quote character".
92+
For the sink of the `SqlJsonToBlobText` copy activity, we choose "Delimited Text" and point it to a specific folder in Azure Blob Storage. This sink includes a dynamically generated unique file name (for example, `@concat(pipeline().RunId,'.json')`.
93+
Since our text file isn't really "delimited" and we don't want it to be parsed into separate columns using commas. We also want to preserve the double-quotes ("), set "Column delimiter" to a Tab ("\t") - or another character not occurring in the data, and then set "Quote character" to "No quote character".
9694

9795
:::image type="content" source="./media/migrate-relational-data/adf2.png" alt-text="Screenshot that highlights the Column delimiter and Quote character settings.":::
9896

9997
### Copy Activity #2: BlobJsonToCosmos
10098

101-
Next, we modify our ADF pipeline by adding the second Copy Activity that looks in Azure Blob Storage for the text file that was created by the first activity. It processes it as "JSON" source to insert to Azure Cosmos DB sink as one document per JSON-row found in the text file.
99+
Next, we modify our ADF pipeline by adding the second Copy Activity that looks in Azure Blob Storage for the text file created by the first activity. It processes it as "JSON" source to insert to Azure Cosmos DB sink as one document per JSON-row found in the text file.
102100

103101
:::image type="content" source="./media/migrate-relational-data/adf3.png" alt-text="Screenshot that highlights the JSON source file and the File path fields.":::
104102

105103
Optionally, we also add a "Delete" activity to the pipeline so that it deletes all of the previous files remaining in the /Orders/ folder prior to each run. Our ADF pipeline now looks something like this:
106104

107105
:::image type="content" source="./media/migrate-relational-data/adf4.png" alt-text="Screenshot that highlights the Delete activity.":::
108106

109-
After we trigger the pipeline above, we see a file created in our intermediary Azure Blob Storage location containing one JSON-object per row:
107+
After we trigger the pipeline mentioned previously, we see a file created in our intermediary Azure Blob Storage location containing one JSON-object per row:
110108

111109
:::image type="content" source="./media/migrate-relational-data/adf5.png" alt-text="Screenshot that shows the created file that contains the JSON objects.":::
112110

113111
We also see Orders documents with properly embedded OrderDetails inserted into our Azure Cosmos DB collection:
114112

115113
:::image type="content" source="./media/migrate-relational-data/adf6.png" alt-text="Screenshot that shows the order details as a part of the Azure Cosmos DB document":::
116114

117-
118115
## Azure Databricks
119116

120-
We can also use Spark in [Azure Databricks](https://azure.microsoft.com/services/databricks/) to copy the data from our SQL Database source to the Azure Cosmos DB destination without creating the intermediary text/JSON files in Azure Blob Storage.
117+
We can also use Spark in [Azure Databricks](https://azure.microsoft.com/services/databricks/) to copy the data from our SQL Database source to the Azure Cosmos DB destination without creating the intermediary text/JSON files in Azure Blob Storage.
121118

122119
> [!NOTE]
123-
> For clarity and simplicity, the code snippets below include dummy database passwords explicitly inline, but you should always use Azure Databricks secrets.
124-
>
120+
> For clarity and simplicity, the code snippets include dummy database passwords explicitly inline, but you should ideally use Azure Databricks secrets.
125121
126122
First, we create and attach the required [SQL connector](/connectors/sql/) and [Azure Cosmos DB connector](https://docs.databricks.com/data/data-sources/azure/cosmosdb-connector.html) libraries to our Azure Databricks cluster. Restart the cluster to make sure libraries are loaded.
127123

128124
:::image type="content" source="./media/migrate-relational-data/databricks1.png" alt-text="Screenshot that shows where to create and attach the required SQL connector and Azure Cosmos DB connector libraries to our Azure Databricks cluster.":::
129125

130-
Next, we present two samples, for Scala and Python.
126+
Next, we present two samples, for Scala and Python.
131127

132128
### Scala
129+
133130
Here, we get the results of the SQL query with “FOR JSON” output into a DataFrame:
134131

135132
```scala
@@ -175,7 +172,7 @@ val configMap = Map(
175172
val configAzure Cosmos DB= Config(configMap)
176173
```
177174

178-
Finally, we define our schema and use from_json to apply the DataFrame prior to saving it to the CosmosDB collection.
175+
Finally, we define our schema and use from_json to apply the DataFrame prior to saving it to the Cosmos DB collection.
179176

180177
```scala
181178
// Convert DataFrame to proper nested schema
@@ -207,10 +204,9 @@ CosmosDBSpark.save(ordersWithSchema, configCosmos)
207204

208205
:::image type="content" source="./media/migrate-relational-data/databricks3.png" alt-text="Screenshot that highlights the proper array for saving to an Azure Cosmos DB collection.":::
209206

210-
211207
### Python
212208

213-
As an alternative approach, you may need to execute JSON transformations in Spark (if the source database does not support "FOR JSON" or a similar operation), or you may wish to use parallel operations for a very large data set. Here we present a PySpark sample. Start by configuring the source and target database connections in the first cell:
209+
As an alternative approach, you may need to execute JSON transformations in Spark if the source database doesn't support `FOR JSON` or a similar operation. Alternatively, you can use parallel operations for a large data set. Here we present a PySpark sample. Start by configuring the source and target database connections in the first cell:
214210

215211
```python
216212
import uuid
@@ -242,7 +238,7 @@ writeConfig = {
242238
}
243239
```
244240

245-
Then, we will query the source Database (in this case SQL Server) for both the order and order detail records, putting the results into Spark Dataframes. We will also create a list containing all the order IDs, and a Thread pool for parallel operations:
241+
Then, we query the source Database (in this case SQL Server) for both the order and order detail records, putting the results into Spark Dataframes. We also create a list containing all the order IDs, and a Thread pool for parallel operations:
246242

247243
```python
248244
import json
@@ -275,7 +271,7 @@ orderids = orders.select('OrderId').collect()
275271
pool = ThreadPool(10)
276272
```
277273

278-
Then, create a function for writing Orders into the target API for NoSQL collection. This function will filter all order details for the given order ID, convert them into a JSON array, and insert the array into a JSON document that we will write into the target API for NoSQL Collection for that order:
274+
Then, create a function for writing Orders into the target API for NoSQL collection. This function filters all order details for the given order ID, converts them into a JSON array, and inserts the array into a JSON document. The JSON document is then written into the target API for NoSQL container for that order:
279275

280276
```python
281277
def writeOrder(orderid):
@@ -327,16 +323,18 @@ def writeOrder(orderid):
327323
df.write.format("com.microsoft.azure.cosmosdb.spark").mode("append").options(**writeConfig).save()
328324
```
329325

330-
Finally, we will call the above using a map function on the thread pool, to execute in parallel, passing in the list of order IDs we created earlier:
326+
Finally, we call the Python `writeOrder` function using a map function on the thread pool, to execute in parallel, passing in the list of order IDs we created earlier:
331327

332328
```python
333329
#map order details to orders in parallel using the above function
334330
pool.map(writeOrder, orderids)
335331
```
332+
336333
In either approach, at the end, we should get properly saved embedded OrderDetails within each Order document in Azure Cosmos DB collection:
337334

338-
:::image type="content" source="./media/migrate-relational-data/databricks4.png" alt-text="Databricks":::
335+
:::image type="content" source="./media/migrate-relational-data/databricks4.png" alt-text="Screenshot of the resulting data after migration.":::
339336

340337
## Next steps
338+
341339
* Learn about [data modeling in Azure Cosmos DB](./modeling-data.md)
342340
* Learn [how to model and partition data on Azure Cosmos DB](./how-to-model-partition-example.md)

0 commit comments

Comments
 (0)