You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
**Primary stream:** The incoming stream of data. This is equivalent to the left side of a join.
23
+
**Primary stream:** The incoming stream of data. This stream is equivalent to the left side of a join.
24
24
25
-
**Lookup stream:** The data which is appended to the primary stream. Which data is appended is determined by the lookup conditions. This is equivalent to the right side of a join.
25
+
**Lookup stream:** The data that is appended to the primary stream. Which data is added is determined by the lookup conditions. This stream is equivalent to the right side of a join.
26
26
27
27
**Match multiple rows:** If enabled, a row with multiple matches in the primary stream will return multiple rows. Otherwise, only a single row will be returned based upon the 'Match on' condition.
28
28
29
-
**Match on:** Only visible if 'Match multiple rows' is enabled. Choose whether to match on any row, the first match, or the last match. Any row is recommended as it executes the fastest. If first row or last row are selected, you'll be required to specify sort conditions.
29
+
**Match on:** Only visible if 'Match multiple rows' is enabled. Choose whether to match on any row, the first match, or the last match. Any row is recommended as it executes the fastest. If first row or last row is selected, you'll be required to specify sort conditions.
30
30
31
31
**Lookup conditions:** Choose which columns to match on. If the equality condition is met, then the rows will be considered a match. Hover and select 'Computed column' to extract a value using the [data flow expression language](data-flow-expression-functions).
32
32
33
-
The lookup transformation only supports equality matches. To customize the lookup expression to include other operators such as greater than, it's recommended to use a [cross join in the join transformation](data-flow-join.md#custom-cross-join). This will avoid any possible cartesian product errors on execution.
33
+
The lookup transformation only supports equality matches. To customize the lookup expression to include other operators such as greater than, it's recommended to use a [cross join in the join transformation](data-flow-join.md#custom-cross-join). A cross join will avoid any possible cartesian product errors on execution.
34
34
35
35
All columns from both streams are included in the output data. To drop duplicate or unwanted columns, add a [select transformation](data-flow-select.md) after your lookup transformation. Columns can also be dropped or renamed in a sink transformation.
36
36
@@ -40,19 +40,19 @@ After your lookup transformation, the function `isMatch()` can be used to see if
An example of this is using the conditional split transformation to split on the `isMatch()` function. In the example above, matching rows go through the top stream and non-matching rows flow through the ```NoMatch``` stream.
43
+
An example of this pattern is using the conditional split transformation to split on the `isMatch()` function. In the example above, matching rows go through the top stream and non-matching rows flow through the ```NoMatch``` stream.
44
44
45
45
## Testing lookup conditions
46
46
47
47
When testing the lookup transformation with data preview in debug mode, use a small set of known data. When sampling rows from a large dataset, you can't predict which rows and keys will be read for testing. The result is non-deterministic, meaning that your join conditions may not return any matches.
48
48
49
49
## Broadcast optimization
50
50
51
-
In Azure Data Factory mapping data flows execute in scaled-out Spark environments. If your dataset can fit into worker node memory space, your lookup performance can be optimized by enabling broadcasitng.
51
+
In Azure Data Factory mapping data flows execute in scaled-out Spark environments. If your dataset can fit into worker node memory space, your lookup performance can be optimized by enabling broadcasting.
Enabling broadcasting pushes the entire dataset into memory. For smaller datasets containing only a few thousand rows, this can greatly improve your lookup performance. For large datasets, this can lead to an out of memory exception.
55
+
Enabling broadcasting pushes the entire dataset into memory. For smaller datasets containing only a few thousand rows, broadcasting can greatly improve your lookup performance. For large datasets, this option can lead to an out of memory exception.
0 commit comments