Skip to content

Commit 959e733

Browse files
committed
updating lookup doc
1 parent ac3d43f commit 959e733

File tree

7 files changed

+52
-36
lines changed

7 files changed

+52
-36
lines changed
Lines changed: 52 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -1,72 +1,88 @@
11
---
2-
title: Mapping data flow Lookup Transformation
3-
description: Azure Data Factory mapping data flow Lookup Transformation
2+
title: Lookup transformation in mapping data flow
3+
description: Reference data from another source using the lookup transformation in mapping data flow.
44
author: kromerm
5+
ms.reviewer: daperlov
56
ms.author: makromer
67
ms.service: data-factory
78
ms.topic: conceptual
89
ms.custom: seo-lt-2019
9-
ms.date: 02/26/2020
10+
ms.date: 03/23/2020
1011
---
1112

12-
# Azure Data Factory mapping data flow Lookup Transformation
13+
# Lookup transformation in mapping data flow
1314

14-
Use Lookup to add reference data from another source to your Data Flow. The Lookup transform requires a defined source that points to your reference table and matches on key fields.
15+
Use the lookup transformation to reference data from another source in a data flow stream. The lookup transformation appends columns from matched data to your source data.
1516

16-
![Lookup Transformation](media/data-flow/lookup1.png "Lookup")
17+
A lookup transformation is similar to a left outer join. All rows from the primary stream will exist in the output stream with additional columns from the lookup stream.
1718

18-
Select the key fields that you wish to match on between the incoming stream fields and the fields from the reference source. You must first have created a new source on the Data Flow design canvas to use as the right-side for the lookup.
19+
## Configuration
1920

20-
When matches are found, the resulting rows and columns from the reference source will be added to your data flow. You can choose which fields of interest that you wish to include in your Sink at the end of your Data Flow. Alternatively, use a Select transformation following your Lookup to prune the field list to keep only the fields from both streams that you'd like to retain.
21+
![Lookup Transformation](media/data-flow/lookup1.png "Lookup")
2122

22-
The Lookup transformation performs the equivalent of a left outer join. So, you'll see all rows from your left source combine with matches from your right side. If you have multiple matching values on your lookup, or if you'd like to customize the lookup expression, it is preferable to switch to a Join transformation and use a cross join. This will avoid any possible cartesian product errors on execution.
23+
**Primary stream:** The incoming stream of data. This is equivalent to the left side of a join.
2324

24-
## Match / No match
25+
**Lookup stream:** The data which is appended to the primary stream. Which data is appended is determined by the lookup conditions. This is equivalent to the right side of a join.
2526

26-
After your Lookup transformation, you can use subsequent transformations to inspect the results of each match row by using the expression function `isMatch()` to make further choices in your logic based on whether or not the Lookup resulted in a row match or not.
27+
**Match multiple rows:** If enabled, a row with multiple matches in the primary stream will return multiple rows. Otherwise, only a single row will be returned based upon the 'Match on' condition.
2728

28-
![Lookup pattern](media/data-flow/lookup111.png "Lookup pattern")
29+
**Match on:** Only visible if 'Match multiple rows' is enabled. Choose whether to match on any row, the first match, or the last match. Any row is recommended as it executes the fastest. If first row or last row are selected, you'll be required to specify sort conditions.
2930

30-
After you use the Lookup transformation, you can add a Conditional Split transformation splitting on the ```isMatch()``` function. In the example above, matching rows go through the top stream and non-matching rows flow through the ```NoMatch``` stream.
31+
**Lookup conditions:** Choose which columns to match on. If the equality condition is met, then the rows will be considered a match. Hover and select 'Computed column' to extract a value using the [data flow expression language](data-flow-expression-functions).
3132

32-
## First or last value
33+
The lookup transformation only supports equality matches. To customize the lookup expression to include other operators such as greater than, it's recommended to use a [cross join in the join transformation](data-flow-join.md#custom-cross-join). This will avoid any possible cartesian product errors on execution.
3334

34-
The Lookup Transformation is implemented as a left outer join. When you have multiple matches from your Lookup, you may want to reduce the multiple matched rows by picking the first matched row, the last match, or any random row.
35+
All columns from both streams are included in the output data. To drop duplicate or unwanted columns, add a [select transformation](data-flow-select.md) after your lookup transformation. Columns can also be dropped or renamed in a sink transformation.
3536

36-
### Option 1
37+
## Analyzing matched rows
3738

38-
![Single Row Lookup](media/data-flow/singlerowlookup.png "Single row lookup")
39+
After your lookup transformation, the function `isMatch()` can be used to see if the lookup matched for individual rows.
3940

40-
* Match multiple rows: Leave it blank to return single row match
41-
* Match on: Select first, last, or any match
42-
* Sort conditions: If you select first or last, ADF requires your data to be ordered so that there is logic behind first and last
41+
![Lookup pattern](media/data-flow/lookup111.png "Lookup pattern")
4342

44-
> [!NOTE]
45-
> Only use the first or last option on your single row selector if you need to control which value to bring back from your lookup. Using "any" or multi-row lookups will perform faster.
43+
An example of this is using the conditional split transformation to split on the `isMatch()` function. In the example above, matching rows go through the top stream and non-matching rows flow through the ```NoMatch``` stream.
4644

47-
### Option 2
45+
## Testing lookup conditions
4846

49-
You can also do this using an Aggregate transformation after your Lookup. In this case, an Aggregate transformation called ```PickFirst``` is used to pick the first value from the lookup matches.
47+
When testing the lookup transformation with data preview in debug mode, use a small set of known data. When sampling rows from a large dataset, you can't predict which rows and keys will be read for testing. The result is non-deterministic, meaning that your join conditions may not return any matches.
5048

51-
![Lookup aggregate](media/data-flow/lookup333.png "Lookup aggregate")
49+
## Broadcast optimization
5250

53-
![Lookup first](media/data-flow/lookup444.png "Lookup first")
51+
In Azure Data Factory mapping data flows execute in scaled-out Spark environments. If your dataset can fit into worker node memory space, your lookup performance can be optimized by enabling broadcasitng.
5452

55-
## Optimizations
53+
![Broadcast Join](media/data-flow/broadcast.png "Broadcast Join")
5654

57-
In Data Factory, Data Flows execute in scaled-out Spark environments. If your dataset can fit into worker node memory space, we can optimize your Lookup performance.
55+
Enabling broadcasting pushes the entire dataset into memory. For smaller datasets containing only a few thousand rows, this can greatly improve your lookup performance. For large datasets, this can lead to an out of memory exception.
5856

59-
![Broadcast Join](media/data-flow/broadcast.png "Broadcast Join")
57+
## Data flow script
6058

61-
### Broadcast join
59+
### Syntax
6260

63-
Select Left and/or Right side broadcast join to request ADF to push the entire dataset from either side of the Lookup relationship into memory. For smaller datasets, this can greatly improve your lookup performance.
61+
```
62+
<leftStream>, <rightStream>
63+
lookup(
64+
<lookupConditionExpression>,
65+
multiple: { true | false },
66+
pickup: { 'first' | 'last' | 'any' }, ## Only required if false is selected for multiple
67+
{ desc | asc }( <sortColumn>, { true | false }), ## Only required if 'first' or 'last' is selected. true/false determines whether to put nulls first
68+
broadcast: { 'none' | 'left' | 'right' | 'both' }
69+
) ~> <lookupTransformationName>
70+
```
71+
### Example
6472

65-
### Data partitioning
73+
![Lookup Transformation](media/data-flow/lookup1.png "Lookup")
6674

67-
You can also specify partitioning of your data by selecting "Set Partitioning" on the Optimize tab of the Lookup transformation to create sets of data that can fit better into memory per worker.
75+
The data flow script for the above lookup configuration is in the code snippet below.
6876

69-
## Next steps
77+
```
78+
SQLProducts, DimProd lookup(ProductID == ProductKey,
79+
multiple: false,
80+
pickup: 'first',
81+
asc(ProductKey, true),
82+
broadcast: 'none')~> LookupKeys
83+
```
84+
##
85+
Next steps
7086

71-
* [Join](data-flow-join.md) and [Exists](data-flow-exists.md) transformations perform similar tasks in ADF mapping data flows. Take a look at those transformations next.
72-
* Use a [Conditional Split](data-flow-conditional-split.md) with ```isMatch()``` to split rows on matching and non-matching values
87+
* The [join](data-flow-join.md) and [exists](data-flow-exists.md) transformations both take in multiple stream inputs
88+
* Use a [conditional split transformation](data-flow-conditional-split.md) with ```isMatch()``` to split rows on matching and non-matching values
40.8 KB
Loading
40.6 KB
Loading
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.

0 commit comments

Comments
 (0)