|
1 | 1 | ---
|
2 |
| -title: Mapping data flow Lookup Transformation |
3 |
| -description: Azure Data Factory mapping data flow Lookup Transformation |
| 2 | +title: Lookup transformation in mapping data flow |
| 3 | +description: Reference data from another source using the lookup transformation in mapping data flow. |
4 | 4 | author: kromerm
|
| 5 | +ms.reviewer: daperlov |
5 | 6 | ms.author: makromer
|
6 | 7 | ms.service: data-factory
|
7 | 8 | ms.topic: conceptual
|
8 | 9 | ms.custom: seo-lt-2019
|
9 |
| -ms.date: 02/26/2020 |
| 10 | +ms.date: 03/23/2020 |
10 | 11 | ---
|
11 | 12 |
|
12 |
| -# Azure Data Factory mapping data flow Lookup Transformation |
| 13 | +# Lookup transformation in mapping data flow |
13 | 14 |
|
14 |
| -Use Lookup to add reference data from another source to your Data Flow. The Lookup transform requires a defined source that points to your reference table and matches on key fields. |
| 15 | +Use the lookup transformation to reference data from another source in a data flow stream. The lookup transformation appends columns from matched data to your source data. |
15 | 16 |
|
16 |
| - |
| 17 | +A lookup transformation is similar to a left outer join. All rows from the primary stream will exist in the output stream with additional columns from the lookup stream. |
17 | 18 |
|
18 |
| -Select the key fields that you wish to match on between the incoming stream fields and the fields from the reference source. You must first have created a new source on the Data Flow design canvas to use as the right-side for the lookup. |
| 19 | +## Configuration |
19 | 20 |
|
20 |
| -When matches are found, the resulting rows and columns from the reference source will be added to your data flow. You can choose which fields of interest that you wish to include in your Sink at the end of your Data Flow. Alternatively, use a Select transformation following your Lookup to prune the field list to keep only the fields from both streams that you'd like to retain. |
| 21 | + |
21 | 22 |
|
22 |
| -The Lookup transformation performs the equivalent of a left outer join. So, you'll see all rows from your left source combine with matches from your right side. If you have multiple matching values on your lookup, or if you'd like to customize the lookup expression, it is preferable to switch to a Join transformation and use a cross join. This will avoid any possible cartesian product errors on execution. |
| 23 | +**Primary stream:** The incoming stream of data. This is equivalent to the left side of a join. |
23 | 24 |
|
24 |
| -## Match / No match |
| 25 | +**Lookup stream:** The data which is appended to the primary stream. Which data is appended is determined by the lookup conditions. This is equivalent to the right side of a join. |
25 | 26 |
|
26 |
| -After your Lookup transformation, you can use subsequent transformations to inspect the results of each match row by using the expression function `isMatch()` to make further choices in your logic based on whether or not the Lookup resulted in a row match or not. |
| 27 | +**Match multiple rows:** If enabled, a row with multiple matches in the primary stream will return multiple rows. Otherwise, only a single row will be returned based upon the 'Match on' condition. |
27 | 28 |
|
28 |
| - |
| 29 | +**Match on:** Only visible if 'Match multiple rows' is enabled. Choose whether to match on any row, the first match, or the last match. Any row is recommended as it executes the fastest. If first row or last row are selected, you'll be required to specify sort conditions. |
29 | 30 |
|
30 |
| -After you use the Lookup transformation, you can add a Conditional Split transformation splitting on the ```isMatch()``` function. In the example above, matching rows go through the top stream and non-matching rows flow through the ```NoMatch``` stream. |
| 31 | +**Lookup conditions:** Choose which columns to match on. If the equality condition is met, then the rows will be considered a match. Hover and select 'Computed column' to extract a value using the [data flow expression language](data-flow-expression-functions). |
31 | 32 |
|
32 |
| -## First or last value |
| 33 | +The lookup transformation only supports equality matches. To customize the lookup expression to include other operators such as greater than, it's recommended to use a [cross join in the join transformation](data-flow-join.md#custom-cross-join). This will avoid any possible cartesian product errors on execution. |
33 | 34 |
|
34 |
| -The Lookup Transformation is implemented as a left outer join. When you have multiple matches from your Lookup, you may want to reduce the multiple matched rows by picking the first matched row, the last match, or any random row. |
| 35 | +All columns from both streams are included in the output data. To drop duplicate or unwanted columns, add a [select transformation](data-flow-select.md) after your lookup transformation. Columns can also be dropped or renamed in a sink transformation. |
35 | 36 |
|
36 |
| -### Option 1 |
| 37 | +## Analyzing matched rows |
37 | 38 |
|
38 |
| - |
| 39 | +After your lookup transformation, the function `isMatch()` can be used to see if the lookup matched for individual rows. |
39 | 40 |
|
40 |
| -* Match multiple rows: Leave it blank to return single row match |
41 |
| -* Match on: Select first, last, or any match |
42 |
| -* Sort conditions: If you select first or last, ADF requires your data to be ordered so that there is logic behind first and last |
| 41 | + |
43 | 42 |
|
44 |
| -> [!NOTE] |
45 |
| -> Only use the first or last option on your single row selector if you need to control which value to bring back from your lookup. Using "any" or multi-row lookups will perform faster. |
| 43 | +An example of this is using the conditional split transformation to split on the `isMatch()` function. In the example above, matching rows go through the top stream and non-matching rows flow through the ```NoMatch``` stream. |
46 | 44 |
|
47 |
| -### Option 2 |
| 45 | +## Testing lookup conditions |
48 | 46 |
|
49 |
| -You can also do this using an Aggregate transformation after your Lookup. In this case, an Aggregate transformation called ```PickFirst``` is used to pick the first value from the lookup matches. |
| 47 | +When testing the lookup transformation with data preview in debug mode, use a small set of known data. When sampling rows from a large dataset, you can't predict which rows and keys will be read for testing. The result is non-deterministic, meaning that your join conditions may not return any matches. |
50 | 48 |
|
51 |
| - |
| 49 | +## Broadcast optimization |
52 | 50 |
|
53 |
| - |
| 51 | +In Azure Data Factory mapping data flows execute in scaled-out Spark environments. If your dataset can fit into worker node memory space, your lookup performance can be optimized by enabling broadcasitng. |
54 | 52 |
|
55 |
| -## Optimizations |
| 53 | + |
56 | 54 |
|
57 |
| -In Data Factory, Data Flows execute in scaled-out Spark environments. If your dataset can fit into worker node memory space, we can optimize your Lookup performance. |
| 55 | +Enabling broadcasting pushes the entire dataset into memory. For smaller datasets containing only a few thousand rows, this can greatly improve your lookup performance. For large datasets, this can lead to an out of memory exception. |
58 | 56 |
|
59 |
| - |
| 57 | +## Data flow script |
60 | 58 |
|
61 |
| -### Broadcast join |
| 59 | +### Syntax |
62 | 60 |
|
63 |
| -Select Left and/or Right side broadcast join to request ADF to push the entire dataset from either side of the Lookup relationship into memory. For smaller datasets, this can greatly improve your lookup performance. |
| 61 | +``` |
| 62 | +<leftStream>, <rightStream> |
| 63 | + lookup( |
| 64 | + <lookupConditionExpression>, |
| 65 | + multiple: { true | false }, |
| 66 | + pickup: { 'first' | 'last' | 'any' }, ## Only required if false is selected for multiple |
| 67 | + { desc | asc }( <sortColumn>, { true | false }), ## Only required if 'first' or 'last' is selected. true/false determines whether to put nulls first |
| 68 | + broadcast: { 'none' | 'left' | 'right' | 'both' } |
| 69 | + ) ~> <lookupTransformationName> |
| 70 | +``` |
| 71 | +### Example |
64 | 72 |
|
65 |
| -### Data partitioning |
| 73 | + |
66 | 74 |
|
67 |
| -You can also specify partitioning of your data by selecting "Set Partitioning" on the Optimize tab of the Lookup transformation to create sets of data that can fit better into memory per worker. |
| 75 | +The data flow script for the above lookup configuration is in the code snippet below. |
68 | 76 |
|
69 |
| -## Next steps |
| 77 | +``` |
| 78 | +SQLProducts, DimProd lookup(ProductID == ProductKey, |
| 79 | + multiple: false, |
| 80 | + pickup: 'first', |
| 81 | + asc(ProductKey, true), |
| 82 | + broadcast: 'none')~> LookupKeys |
| 83 | +``` |
| 84 | +## |
| 85 | +Next steps |
70 | 86 |
|
71 |
| -* [Join](data-flow-join.md) and [Exists](data-flow-exists.md) transformations perform similar tasks in ADF mapping data flows. Take a look at those transformations next. |
72 |
| -* Use a [Conditional Split](data-flow-conditional-split.md) with ```isMatch()``` to split rows on matching and non-matching values |
| 87 | +* The [join](data-flow-join.md) and [exists](data-flow-exists.md) transformations both take in multiple stream inputs |
| 88 | +* Use a [conditional split transformation](data-flow-conditional-split.md) with ```isMatch()``` to split rows on matching and non-matching values |
0 commit comments