Skip to content

Commit f788f60

Browse files
authored
Update how-to-data-flow-error-rows.md
1 parent 5a288cf commit f788f60

File tree

1 file changed

+15
-57
lines changed

1 file changed

+15
-57
lines changed

articles/data-factory/how-to-data-flow-error-rows.md

Lines changed: 15 additions & 57 deletions
Original file line numberDiff line numberDiff line change
@@ -16,73 +16,31 @@ ms.author: makromer
1616

1717
A very common scenario in Data Factory when using mapping data flows, is to write your transformed data to an Azure SQL database. In this scenario, a common error condition that you must prevent against is possible column truncation. Follow these steps to provide logging of columns that won't fit into a target string column, allowing your data flow to continue in those scenarios.
1818

19-
## Create a pipeline
19+
## Scenario
2020

21-
1. Select **+New Pipeline** to create a new pipeline.
21+
1. We have a target Azure SQL database table that has an ```nvarchar(5)``` column called "name".
2222

23-
2. Add a data flow activity, which will be used for processing fixed-width files:
23+
2. Inside of our data flow, we want to map movie titles from our sink to that target "name" column.
2424

25-
![Fixed Width Pipeline](media/data-flow/fwpipe.png)
25+
![Movie data flow 1](media/data-flow/error4.png)
26+
27+
3. The problem is that the movie title won't all fit within a sink column that can only hold 5 characters. When you execute this data flow, you will receive an error like this one: ```"Job failed due to reason: DF-SYS-01 at Sink 'WriteToDatabase': java.sql.BatchUpdateException: String or binary data would be truncated. java.sql.BatchUpdateException: String or binary data would be truncated."```
2628

27-
3. In the data flow activity, select **New mapping data flow**.
29+
## How to design around this condition
2830

29-
4. Add a Source, Derived Column, Select, and Sink transformation:
31+
1. In this scenario, the maximum length of the "name" column is five characters. So, let's add a conditional split transformation that will allow us to log rows with "titles" that are longer than five characters while also allowing the rest of the rows that can fit into that space to write to the database.
3032

31-
![Fixed Width Data Flow](media/data-flow/fw2.png)
33+
![conditional split](media/data-flow/error1.png)
3234

33-
5. Configure the Source transformation to use a new dataset, which will be of the Delimited Text type.
35+
2. This conditional split transformation defines the maximum length of "title" to be 5. Any row that is less than or equal to five will go into the ```GoodRows``` stream. Any row that is larger than five will go into the ```BadRows``` stream.
3436

35-
6. Don't set any column delimiter or headers.
37+
3. Now we need to log the rows that failed. Add a sink transformation to the ```BadRows``` stream for logging. Here, we'll "auto-map" all of the fields so that we have logging of the complete transaction record. This is a text delimited CSV file output to a single file in Blob Storage. We'll call the log file "badrows.csv".
3638

37-
Now we'll set field starting points and lengths for the contents of this file:
39+
![Bad rows](media/data-flow/error3.png)
40+
41+
4. The completed data flow is shown below. We are now able to split off error rows to avoid the SQL truncation errors and put those entries into a log file. Meanwhile, successful rows can continue to write to our target database.
3842

39-
```
40-
1234567813572468
41-
1234567813572468
42-
1234567813572468
43-
1234567813572468
44-
1234567813572468
45-
1234567813572468
46-
1234567813572468
47-
1234567813572468
48-
1234567813572468
49-
1234567813572468
50-
1234567813572468
51-
1234567813572468
52-
1234567813572468
53-
```
54-
55-
7. On the **Projection** tab of your Source transformation, you should see a string column that's named *Column_1*.
56-
57-
8. In the Derived column, create a new column.
58-
59-
9. We'll give the columns simple names like *col1*.
60-
61-
10. In the expression builder, type the following:
62-
63-
```substring(Column_1,1,4)```
64-
65-
![derived column](media/data-flow/fwderivedcol1.png)
66-
67-
11. Repeat step 10 for all the columns you need to parse.
68-
69-
12. Select the **Inspect** tab to see the new columns that will be generated:
70-
71-
![inspect](media/data-flow/fwinspect.png)
72-
73-
13. Use the Select transform to remove any of the columns that you don't need for transformation:
74-
75-
![select transformation](media/data-flow/fwselect.png)
76-
77-
14. Use Sink to output the data to a folder:
78-
79-
![fixed width sink](media/data-flow/fwsink.png)
80-
81-
Here's what the output looks like:
82-
83-
![fixed width output](media/data-flow/fxdoutput.png)
84-
85-
The fixed-width data is now split, with four characters each and assigned to Col1, Col2, Col3, Col4, and so on. Based on the preceding example, the data is split into four columns.
43+
![complete data flow](media/data-flow/error2.png)
8644

8745
## Next steps
8846

0 commit comments

Comments
 (0)