You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/data-factory/how-to-data-flow-error-rows.md
+15-57Lines changed: 15 additions & 57 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -16,73 +16,31 @@ ms.author: makromer
16
16
17
17
A very common scenario in Data Factory when using mapping data flows, is to write your transformed data to an Azure SQL database. In this scenario, a common error condition that you must prevent against is possible column truncation. Follow these steps to provide logging of columns that won't fit into a target string column, allowing your data flow to continue in those scenarios.
18
18
19
-
## Create a pipeline
19
+
## Scenario
20
20
21
-
1.Select **+New Pipeline** to create a new pipeline.
21
+
1.We have a target Azure SQL database table that has an ```nvarchar(5)``` column called "name".
22
22
23
-
2.Add a data flow activity, which will be used for processing fixed-width files:
23
+
2.Inside of our data flow, we want to map movie titles from our sink to that target "name" column.
3. The problem is that the movie title won't all fit within a sink column that can only hold 5 characters. When you execute this data flow, you will receive an error like this one: ```"Job failed due to reason: DF-SYS-01 at Sink 'WriteToDatabase': java.sql.BatchUpdateException: String or binary data would be truncated. java.sql.BatchUpdateException: String or binary data would be truncated."```
26
28
27
-
3. In the data flow activity, select **New mapping data flow**.
29
+
## How to design around this condition
28
30
29
-
4. Add a Source, Derived Column, Select, and Sink transformation:
31
+
1. In this scenario, the maximum length of the "name" column is five characters. So, let's add a conditional split transformation that will allow us to log rows with "titles" that are longer than five characters while also allowing the rest of the rows that can fit into that space to write to the database.
30
32
31
-

33
+

32
34
33
-
5. Configure the Source transformation to use a new dataset, which will be of the Delimited Text type.
35
+
2. This conditional split transformation defines the maximum length of "title" to be 5. Any row that is less than or equal to five will go into the ```GoodRows``` stream. Any row that is larger than five will go into the ```BadRows``` stream.
34
36
35
-
6. Don't set any column delimiter or headers.
37
+
3. Now we need to log the rows that failed. Add a sink transformation to the ```BadRows``` stream for logging. Here, we'll "auto-map" all of the fields so that we have logging of the complete transaction record. This is a text delimited CSV file output to a single file in Blob Storage. We'll call the log file "badrows.csv".
36
38
37
-
Now we'll set field starting points and lengths for the contents of this file:
39
+

40
+
41
+
4. The completed data flow is shown below. We are now able to split off error rows to avoid the SQL truncation errors and put those entries into a log file. Meanwhile, successful rows can continue to write to our target database.
38
42
39
-
```
40
-
1234567813572468
41
-
1234567813572468
42
-
1234567813572468
43
-
1234567813572468
44
-
1234567813572468
45
-
1234567813572468
46
-
1234567813572468
47
-
1234567813572468
48
-
1234567813572468
49
-
1234567813572468
50
-
1234567813572468
51
-
1234567813572468
52
-
1234567813572468
53
-
```
54
-
55
-
7. On the **Projection** tab of your Source transformation, you should see a string column that's named *Column_1*.
56
-
57
-
8. In the Derived column, create a new column.
58
-
59
-
9. We'll give the columns simple names like *col1*.
60
-
61
-
10. In the expression builder, type the following:
The fixed-width data is now split, with four characters each and assigned to Col1, Col2, Col3, Col4, and so on. Based on the preceding example, the data is split into four columns.
0 commit comments