Skip to content

Commit 577d0df

Browse files
committed
Acrolinx fixes
1 parent dec690b commit 577d0df

File tree

4 files changed

+48
-48
lines changed

4 files changed

+48
-48
lines changed

articles/data-factory/how-to-data-flow-error-rows.md

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -16,11 +16,11 @@ A common scenario in Data Factory when using mapping data flows, is to write you
1616

1717
There are two primary methods to graceful handle errors when writing data to your database sink in ADF data flows:
1818

19-
* Set the sink [error row handling](./connector-azure-sql-database.md#error-row-handling) to "Continue on Error" when processing database data. This is an automated catch-all method that does not require custom logic in your data flow.
20-
* Alternatively, follow the steps below to provide logging of columns that won't fit into a target string column, allowing your data flow to continue.
19+
* Set the sink [error row handling](./connector-azure-sql-database.md#error-row-handling) to "Continue on Error" when processing database data. This is an automated catch-all method that doesn't require custom logic in your data flow.
20+
* Alternatively, use the following steps to provide logging of columns that don't fit into a target string column, allowing your data flow to continue.
2121

2222
> [!NOTE]
23-
> When enabling automatic error row handling, as opposed to the method below of writing your own error handling logic, there will be a small performance penalty incurred by and additional step taken by ADF to perform a 2-phase operation to trap errors.
23+
> When enabling automatic error row handling, as opposed to the following method of writing your own error handling logic, there will be a small performance penalty incurred by and additional step taken by ADF to perform a 2-phase operation to trap errors.
2424
2525
## Scenario
2626

@@ -30,28 +30,28 @@ There are two primary methods to graceful handle errors when writing data to you
3030

3131
:::image type="content" source="media/data-flow/error4.png" alt-text="Movie data flow 1":::
3232

33-
3. The problem is that the movie title won't all fit within a sink column that can only hold 5 characters. When you execute this data flow, you will receive an error like this one: ```"Job failed due to reason: DF-SYS-01 at Sink 'WriteToDatabase': java.sql.BatchUpdateException: String or binary data would be truncated. java.sql.BatchUpdateException: String or binary data would be truncated."```
33+
3. The problem is that the movie title doesn't all fit within a sink column that can only hold five characters. When you execute this data flow, you receive an error like this one: ```"Job failed due to reason: DF-SYS-01 at Sink 'WriteToDatabase': java.sql.BatchUpdateException: String or binary data would be truncated. java.sql.BatchUpdateException: String or binary data would be truncated."```
3434

3535
This video walks through an example of setting-up error row handling logic in your data flow:
3636
> [!VIDEO https://www.microsoft.com/en-us/videoplayer/embed/RE4uOHj]
3737
3838
## How to design around this condition
3939

40-
1. In this scenario, the maximum length of the "name" column is five characters. So, let's add a conditional split transformation that will allow us to log rows with "titles" that are longer than five characters while also allowing the rest of the rows that can fit into that space to write to the database.
40+
1. In this scenario, the maximum length of the "name" column is five characters. So, let's add a conditional split transformation that allows us to log rows with "titles" that are longer than five characters while also allowing the rest of the rows that can fit into that space to write to the database.
4141

4242
:::image type="content" source="media/data-flow/error1.png" alt-text="conditional split":::
4343

44-
2. This conditional split transformation defines the maximum length of "title" to be five. Any row that is less than or equal to five will go into the ```GoodRows``` stream. Any row that is larger than five will go into the ```BadRows``` stream.
44+
2. This conditional split transformation defines the maximum length of "title" to be five. Any row that is less than or equal to five goes into the ```GoodRows``` stream. Any row that is larger than five goes into the ```BadRows``` stream.
4545

46-
3. Now we need to log the rows that failed. Add a sink transformation to the ```BadRows``` stream for logging. Here, we'll "auto-map" all of the fields so that we have logging of the complete transaction record. This is a text-delimited CSV file output to a single file in Blob Storage. We'll call the log file "badrows.csv".
46+
3. Now we need to log the rows that failed. Add a sink transformation to the ```BadRows``` stream for logging. Here, we "automap" all of the fields so that we have logging of the complete transaction record. This is a text-delimited CSV file output to a single file in Blob Storage. We call the log file "badrows.csv".
4747

4848
:::image type="content" source="media/data-flow/error3.png" alt-text="Bad rows":::
4949

50-
4. The completed data flow is shown below. We are now able to split off error rows to avoid the SQL truncation errors and put those entries into a log file. Meanwhile, successful rows can continue to write to our target database.
50+
4. The completed data flow is shown here. We're now able to split off error rows to avoid the SQL truncation errors and put those entries into a log file. Meanwhile, successful rows can continue to write to our target database.
5151

5252
:::image type="content" source="media/data-flow/error2.png" alt-text="complete data flow":::
5353

54-
5. If you choose the error row handling option in the sink transformation and set "Output error rows", ADF will automatically generate a CSV file output of your row data along with the driver-reported error messages. You do not need to add that logic manually to your data flow with that alternative option. There will be a small performance penalty incurred with this option so that ADF can implement a 2-phase methodology to trap errors and log them.
54+
5. If you choose the error row handling option in the sink transformation and set "Output error rows", ADF automatically generates a CSV file output of your row data along with the driver-reported error messages. You don't need to add that logic manually to your data flow with that alternative option. You incur a small performance penalty with this option so that ADF can implement a 2-phase methodology to trap errors and log them.
5555

5656
:::image type="content" source="media/data-flow/error-row-3.png" alt-text="complete data flow with error rows":::
5757

articles/data-factory/how-to-sqldb-to-cosmosdb.md

Lines changed: 12 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -10,13 +10,13 @@ ms.subservice: data-flows
1010

1111
# Migrate normalized database schema from Azure SQL Database to Azure Cosmos DB denormalized container
1212

13-
This guide will explain how to take an existing normalized database schema in Azure SQL Database and convert it into an Azure Cosmos DB denormalized schema for loading into Azure Cosmos DB.
13+
This guide explains how to take an existing normalized database schema in Azure SQL Database and convert it into an Azure Cosmos DB denormalized schema for loading into Azure Cosmos DB.
1414

1515
SQL schemas are typically modeled using third normal form, resulting in normalized schemas that provide high levels of data integrity and fewer duplicate data values. Queries can join entities together across tables for reading. Azure Cosmos DB is optimized for super-quick transactions and querying within a collection or container via denormalized schemas with data self-contained inside a document.
1616

17-
Using Azure Data Factory, we'll build a pipeline that uses a single Mapping Data Flow to read from two Azure SQL Database normalized tables that contain primary and foreign keys as the entity relationship. ADF will join those tables into a single stream using the data flow Spark engine, collect joined rows into arrays and produce individual cleansed documents for insert into a new Azure Cosmos DB container.
17+
Using Azure Data Factory, we build a pipeline that uses a single Mapping Data Flow to read from two Azure SQL Database normalized tables that contain primary and foreign keys as the entity relationship. ADF will join those tables into a single stream using the data flow Spark engine, collect joined rows into arrays and produce individual cleansed documents for insert into a new Azure Cosmos DB container.
1818

19-
This guide will build a new container on the fly called "orders" that will use the ```SalesOrderHeader``` and ```SalesOrderDetail``` tables from the standard SQL Server [Adventure Works sample database](/sql/samples/adventureworks-install-configure?tabs=ssms). Those tables represent sales transactions joined by ```SalesOrderID```. Each unique detail records has its own primary key of ```SalesOrderDetailID```. The relationship between header and detail is ```1:M```. We'll join on ```SalesOrderID``` in ADF and then roll each related detail record into an array called "detail".
19+
This guide builds a new container on the fly called "orders" that will use the ```SalesOrderHeader``` and ```SalesOrderDetail``` tables from the standard SQL Server [Adventure Works sample database](/sql/samples/adventureworks-install-configure?tabs=ssms). Those tables represent sales transactions joined by ```SalesOrderID```. Each unique detail records have its own primary key of ```SalesOrderDetailID```. The relationship between header and detail is ```1:M```. We join on ```SalesOrderID``` in ADF and then roll each related detail record into an array called "detail".
2020

2121
The representative SQL query for this guide is:
2222

@@ -33,7 +33,7 @@ The representative SQL query for this guide is:
3333
FROM SalesLT.SalesOrderHeader o;
3434
```
3535

36-
The resulting Azure Cosmos DB container will embed the inner query into a single document and look like this:
36+
The resulting Azure Cosmos DB container embeds the inner query into a single document and look like this:
3737

3838
:::image type="content" source="media/data-flow/cosmosb3.png" alt-text="Collection":::
3939

@@ -45,7 +45,7 @@ The resulting Azure Cosmos DB container will embed the inner query into a single
4545

4646
3. In the data flow activity, select **New mapping data flow**.
4747

48-
4. We will construct this data flow graph below
48+
4. We construct this data flow graph below
4949

5050
:::image type="content" source="media/data-flow/cosmosb1.png" alt-text="Data Flow Graph":::
5151

@@ -55,13 +55,13 @@ The resulting Azure Cosmos DB container will embed the inner query into a single
5555

5656
7. On the top source, add a Derived Column transformation after "SourceOrderDetails". Call the new transformation "TypeCast". We need to round the ```UnitPrice``` column and cast it to a double data type for Azure Cosmos DB. Set the formula to: ```toDouble(round(UnitPrice,2))```.
5757

58-
8. Add another derived column and call it "MakeStruct". This is where we will create a hierarchical structure to hold the values from the details table. Remember, details is a ```M:1``` relation to header. Name the new structure ```orderdetailsstruct``` and create the hierarchy in this way, setting each subcolumn to the incoming column name:
58+
8. Add another derived column and call it "MakeStruct". This is where we create a hierarchical structure to hold the values from the details table. Remember, details is a ```M:1``` relation to header. Name the new structure ```orderdetailsstruct``` and create the hierarchy in this way, setting each subcolumn to the incoming column name:
5959

6060
:::image type="content" source="media/data-flow/cosmosdb-9.png" alt-text="Create Structure":::
6161

6262
9. Now, let's go to the sales header source. Add a Join transformation. For the right-side select "MakeStruct". Leave it set to inner join and choose ```SalesOrderID``` for both sides of the join condition.
6363

64-
10. Click on the Data Preview tab in the new join that you added so that you can see your results up to this point. You should see all of the header rows joined with the detail rows. This is the result of the join being formed from the ```SalesOrderID```. Next, we'll combine the details from the common rows into the details struct and aggregate the common rows.
64+
10. Select on the Data Preview tab in the new join that you added so that you can see your results up to this point. You should see all of the header rows joined with the detail rows. This is the result of the join being formed from the ```SalesOrderID```. Next, we combine the details from the common rows into the details struct and aggregate the common rows.
6565

6666
:::image type="content" source="media/data-flow/cosmosb4.png" alt-text="Join":::
6767

@@ -73,29 +73,29 @@ The resulting Azure Cosmos DB container will embed the inner query into a single
7373

7474
13. Now let's again cast a currency column, this time ```TotalDue```. Like we did above in step 7, set the formula to: ```toDouble(round(TotalDue,2))```.
7575

76-
14. Here's where we will denormalize the rows by grouping by the common key ```SalesOrderID```. Add an Aggregate transformation and set the group by to ```SalesOrderID```.
76+
14. Here's where we denormalize the rows by grouping by the common key ```SalesOrderID```. Add an Aggregate transformation and set the group by to ```SalesOrderID```.
7777

7878
15. In the aggregate formula, add a new column called "details" and use this formula to collect the values in the structure that we created earlier called ```orderdetailsstruct```: ```collect(orderdetailsstruct)```.
7979

80-
16. The aggregate transformation will only output columns that are part of aggregate or group by formulas. So, we need to include the columns from the sales header as well. To do that, add a column pattern in that same aggregate transformation. This pattern will include all other columns in the output, excluding the columns listed below (OrderQty, UnitPrice, SalesOrderID):
80+
16. The aggregate transformation will only output columns that are part of aggregate or group by formulas. So, we need to include the columns from the sales header as well. To do that, add a column pattern in that same aggregate transformation. This pattern includes all other columns in the output, excluding the columns listed below (OrderQty, UnitPrice, SalesOrderID):
8181

8282
`instr(name,'OrderQty')==0&&instr(name,'UnitPrice')==0&&instr(name,'SalesOrderID')==0`
8383

8484
17. Use the "this" syntax ($$) in the other properties so that we maintain the same column names and use the ```first()``` function as an aggregate. This tells ADF to keep the first matching value found:
8585

8686
:::image type="content" source="media/data-flow/cosmosb6.png" alt-text="Aggregate":::
8787

88-
18. We're ready to finish the migration flow by adding a sink transformation. Click "new" next to dataset and add an Azure Cosmos DB dataset that points to your Azure Cosmos DB database. For the collection, we'll call it "orders" and it will have no schema and no documents because it will be created on the fly.
88+
18. We're ready to finish the migration flow by adding a sink transformation. Select "new" next to dataset and add an Azure Cosmos DB dataset that points to your Azure Cosmos DB database. For the collection, we call it "orders" and it has no schema and no documents because it will be created on the fly.
8989

9090
19. In Sink Settings, Partition Key to ```/SalesOrderID``` and collection action to "recreate". Make sure your mapping tab looks like this:
9191

9292
:::image type="content" source="media/data-flow/cosmosb7.png" alt-text="Screenshot shows the Mapping tab.":::
9393

94-
20. Click on data preview to make sure that you are seeing these 32 rows set to insert as new documents into your new container:
94+
20. Select on data preview to make sure that you're seeing these 32 rows set to insert as new documents into your new container:
9595

9696
:::image type="content" source="media/data-flow/cosmosb8.png" alt-text="Screenshot shows the Data preview tab.":::
9797

98-
If everything looks good, you are now ready to create a new pipeline, add this data flow activity to that pipeline and execute it. You can execute from debug or a triggered run. After a few minutes, you should have a new denormalized container of orders called "orders" in your Azure Cosmos DB database.
98+
If everything looks good, you're now ready to create a new pipeline, add this data flow activity to that pipeline and execute it. You can execute from debug or a triggered run. After a few minutes, you should have a new denormalized container of orders called "orders" in your Azure Cosmos DB database.
9999

100100
## Related content
101101

0 commit comments

Comments
 (0)