Skip to content

Commit e9dd9b5

Browse files
authored
Merge pull request #103342 from djpmsft/docUpdates
Doc updates
2 parents 0bb6737 + 64f0fa1 commit e9dd9b5

File tree

1 file changed

+31
-82
lines changed

1 file changed

+31
-82
lines changed

articles/data-factory/data-flow-troubleshoot-guide.md

Lines changed: 31 additions & 82 deletions
Original file line numberDiff line numberDiff line change
@@ -1,115 +1,64 @@
11
---
2-
title: Troubleshoot Data Flows
3-
description: Learn how to troubleshoot data flow issues in Azure Data Factory.
2+
title: Troubleshoot data flows
3+
description: Learn how to troubleshoot data flow issues in Azure Data Factory.
44
services: data-factory
55
ms.author: makromer
66
author: kromerm
77
manager: anandsub
88
ms.service: data-factory
99
ms.topic: troubleshooting
10-
ms.custom: seo-lt-2019
11-
ms.date: 12/19/2019
10+
ms.date: 02/04/2020
1211
---
13-
14-
# Troubleshoot Azure Data Factory Data Flows
12+
# Troubleshoot data flows in Azure Data Factory
1513

1614
This article explores common troubleshooting methods for data flows in Azure Data Factory.
1715

1816
## Common errors and messages
1917

20-
### Error message: DF-SYS-01: shaded.databricks.org.apache.hadoop.fs.azure.AzureException: com.microsoft.azure.storage.StorageException: The specified container does not exist.
21-
22-
- **Symptoms**: Data preview, debug, and pipeline data flow execution fails because container does not exist
23-
24-
- **Cause**: When dataset contains a container that does not exist in the storage
25-
26-
- **Resolution**: Make sure that the container you are referencing in your dataset exists
27-
28-
### Error message: DF-SYS-01: java.lang.AssertionError: assertion failed: Conflicting directory structures detected. Suspicious paths
29-
30-
- **Symptoms**: When using wildcards in source transformation with Parquet files
31-
32-
- **Cause**: Incorrect or invalid wildcard syntax
33-
34-
- **Resolution**: Check the wildcard syntax you are using in your source transformation options
35-
36-
### Error message: DF-SRC-002: 'container' (Container name) is required
37-
38-
- **Symptoms**: Data preview, debug, and pipeline data flow execution fails because container does not exist
39-
40-
- **Cause**: When dataset contains a container that does not exist in the storage
41-
42-
- **Resolution**: Make sure that the container you are referencing in your dataset exists
43-
44-
### Error message: DF-UNI-001: PrimaryKeyValue has incompatible types IntegerType and StringType
45-
46-
- **Symptoms**: Data preview, debug, and pipeline data flow execution fails because container does not exist
47-
48-
- **Cause**: Happens when trying to insert incorrect primary key type in database sinks
49-
50-
- **Resolution**: Use a Derived Column to cast the column that you are using for the primary key in your data flow to match the data type of your target database
51-
52-
### Error message: DF-SYS-01: com.microsoft.sqlserver.jdbc.SQLServerException: The TCP/IP connection to the host xxxxx.database.windows.net port 1433 has failed. Error: "xxxx.database.windows.net. Verify the connection properties. Make sure that an instance of SQL Server is running on the host and accepting TCP/IP connections at the port. Make sure that TCP connections to the port are not blocked by a firewall."
18+
### Error code: DF-Executor-SourceInvalidPayload
19+
- **Message**: Data preview, debug, and pipeline data flow execution failed because container does not exist
20+
- **Causes**: When dataset contains a container that does not exist in the storage
21+
- **Recommendation**: Make sure that the container referenced in your dataset exists or accessible.
5322

54-
- **Symptoms**: Unable to preview data or execute pipeline with database source or sink
23+
### Error code: DF-Executor-SystemImplicitCartesian
5524

56-
- **Cause**: Database is protected by firewall
25+
- **Message**: Implicit cartesian product for INNER join is not supported, use CROSS JOIN instead. Columns used in join should create a unique key for rows.
26+
- **Causes**: Implicit cartesian product for INNER join between logical plans is not supported. If the columns used in the join creates the unique key
27+
- **Recommendation**: For non-equality based joins you have to opt for CROSS JOIN.
5728

58-
- **Resolution**: Open the firewall access to the database
29+
### Error code: DF-Executor-SystemInvalidJson
5930

60-
### Error message: DF-SYS-01: com.microsoft.sqlserver.jdbc.SQLServerException: There is already an object named 'xxxxxx' in the database.
31+
- **Message**: JSON parsing error, unsupported encoding or multiline
32+
- **Causes**: Possible issues with the JSON file: unsupported encoding, corrupt bytes, or using JSON source as single document on many nested lines
33+
- **Recommendation**: Verify the JSON file's encoding is supported. On the Source transformation that is using a JSON dataset, expand 'JSON Settings' and turn on 'Single Document'.
34+
35+
### Error code: DF-Executor-BroadcastTimeout
6136

62-
- **Symptoms**: Sink fails to create table
37+
- **Message**: Broadcast join timeout error, make sure broadcast stream produces data within 60 secs in debug runs and 300 secs in job runs
38+
- **Causes**: Broadcast has a default timeout of 60 secs in debug runs and 300 secs in job runs. Stream chosen for broadcast seems to large to produce data within this limit.
39+
- **Recommendation**: Avoid broadcasting large data streams where the processing can take more than 60 secs. Choose a smaller stream to broadcast instead. Large SQL/DW tables and source files are typically bad candidates.
6340

64-
- **Cause**: There is already an existing table name in the target database with the same name defined in your source or in the dataset
41+
### Error code: DF-Executor-Conversion
6542

66-
- **Resolution**: Change the name of the table that you are trying to create
43+
- **Message**: Converting to a date or time failed due to an invalid character
44+
- **Causes**: Data is not in the expected format
45+
- **Recommendation**: Use the correct data type
6746

68-
### Error message: DF-SYS-01: com.microsoft.sqlserver.jdbc.SQLServerException: String or binary data would be truncated.
47+
### Error code: DF-Executor-InvalidColumn
6948

70-
- **Symptoms**: When writing data to a SQL sink, your data flow fails on pipeline execution with possible truncation error.
71-
72-
- **Cause**: A field from your data flow maps to a column in your SQL database is not wide enough to store the value, causing the SQL driver to throw this error
73-
74-
- **Resolution**: You can reduce the length of the data for string columns using ```left()``` in a Derived Column or implement the ["error row" pattern.](how-to-data-flow-error-rows.md)
75-
76-
### Error message: Since Spark 2.3, the queries from raw JSON/CSV files are disallowed when the referenced columns only include the internal corrupt record column.
77-
78-
- **Symptoms**: Reading from a JSON source fails
79-
80-
- **Cause**: When reading from a JSON source with a single document on many nested lines, ADF, via Spark, is unable to determine where a new document begins and the previous document ends.
81-
82-
- **Resolution**: On the Source transformation that is using a JSON dataset, expand "JSON Settings" and turn on "Single Document".
83-
84-
### Error message: Duplicate columns found in Join
85-
86-
- **Symptoms**: Join transformation resulted in columns from both the left and the right side that include duplicate column names
87-
88-
- **Cause**: The streams that are being joined have common column names
89-
90-
- **Resolution**: Add a Select transformation following the Join and select "Remove duplicate columns" for both the input and output.
91-
92-
### Error message: Possible cartesian product
93-
94-
- **Symptoms**: Join or Lookup transformation detected possible cartesian product upon execution of your data flow
95-
96-
- **Cause**: If you have not explicitly directed ADF to use a cross join, the data flow may fail
97-
98-
- **Resolution**: Change your Lookup or Join transformation to a Join using Custom cross join and enter your lookup or join condition in the expression editor. If you would like to explicitly produce a full cartesian product, use the Derived Column transformation in each of the two independent streams before the join to create a synthetic key to match on. For example, create a new column in Derived Column in each stream called ```SyntheticKey``` and set it equal to ```1```. Then use ```a.SyntheticKey == b.SyntheticKey``` as your custom join expression.
99-
100-
> [!NOTE]
101-
> Make sure to include at least one column from each side of your left and right relationship in a custom cross join. Executing cross joins with static values instead of columns from each side will result in full scans of the entire dataset, causing your data flow to perform poorly.
49+
- **Message**: Column name needs to be specified in the query, set an alias if using a SQL function
50+
- **Causes**: No column name was specified
51+
- **Recommendation**: Set an alias if using a SQL function such as min()/max(), etc.
10252

10353
## General troubleshooting guidance
10454

10555
1. Check the status of your dataset connections. In each Source and Sink transformation, visit the Linked Service for each dataset that you are using and test connections.
106-
2. Check the status of your file and table connections from the data flow designer. Switch on Debug and click on Data Preview on your Source transformations to ensure that you are able to access your data.
107-
3. If everything looks good from data preview, go into the Pipeline designer and put your data flow in a pipeline activity. Debug the pipeline for an end-to-end test.
56+
1. Check the status of your file and table connections from the data flow designer. Switch on Debug and click on Data Preview on your Source transformations to ensure that you are able to access your data.
57+
1. If everything looks good from data preview, go into the Pipeline designer and put your data flow in a pipeline activity. Debug the pipeline for an end-to-end test.
10858

10959
## Next steps
11060

11161
For more troubleshooting help, try these resources:
112-
11362
* [Data Factory blog](https://azure.microsoft.com/blog/tag/azure-data-factory/)
11463
* [Data Factory feature requests](https://feedback.azure.com/forums/270578-data-factory)
11564
* [Azure videos](https://azure.microsoft.com/resources/videos/index/?sort=newest&services=data-factory)

0 commit comments

Comments
 (0)