|
1 | 1 | ---
|
2 |
| -title: Column Patterns in Azure Data Factory mapping data flows |
3 |
| -description: Create generalized data transformation patterns using Azure Data Factory Column Patterns in mapping data flows |
| 2 | +title: Column patterns in Azure Data Factory mapping data flow |
| 3 | +description: Create generalized data transformation patterns using column patterns in Azure Data Factory mapping data flows |
4 | 4 | author: kromerm
|
5 | 5 | ms.author: makromer
|
| 6 | +ms.reviewer: daperlov |
6 | 7 | ms.service: data-factory
|
7 | 8 | ms.topic: conceptual
|
8 |
| -ms.date: 01/30/2019 |
| 9 | +ms.date: 10/21/2019 |
9 | 10 | ---
|
10 | 11 |
|
11 |
| -# Mapping data flows column patterns |
| 12 | +# Using column patterns in mapping data flow |
12 | 13 |
|
| 14 | +Several mapping data flow transformations allow you to reference template columns based on patterns instead of hard-coded column names. This matching is known as *column patterns*. You can define patterns to match columns based on name, data type, stream, or position instead of requiring exact field names. There are two scenarios where column patterns are useful: |
13 | 15 |
|
| 16 | +* If incoming source fields change often such as the case of changing columns in text files or NoSQL databases. This scenario is known as [schema drift](concepts-data-flow-schema-drift.md). |
| 17 | +* If you wish to do a common operation on a large group of columns. For example, wanting to cast every column that has 'total' in its column name into a double. |
14 | 18 |
|
15 |
| -Several Azure Data Factory Data Flow transformations support the idea of "Columns Patterns" so that you can create template columns based on patterns instead of hard-coded column names. You can use this feature within the Expression Builder to define patterns to match columns for transformation instead of requiring exact, specific field names. Patterns are useful if incoming source fields change often, particularly in the case of changing columns in text files or NoSQL databases. This condition is sometimes referred to as "Schema Drift". |
| 19 | +Column patterns are currently available in the derived column, aggregate, select, and sink transformations. |
16 | 20 |
|
17 |
| -This "Flexible Schema" handling is currently found in the Derived Column and Aggregate transformations as well as the Select and Sink transformations as "rule-based mapping". |
| 21 | +## Column patterns in derived column and aggregate |
| 22 | + |
| 23 | +To add a column pattern in a derived column or the Aggregates tab of an aggregate transformation, click the plus icon to the right of an existing column. Select **Add column pattern**. |
| 24 | + |
| 25 | + |
| 26 | + |
| 27 | +Use the [expression builder](concepts-data-flow-expression-builder.md) to enter the match condition. Create a boolean expression that matches columns based on the `name`, `type`, `stream`, and `position` of the column. The pattern will affect any column, drifted or defined, where the condition returns true. |
| 28 | + |
| 29 | +The two expression boxes below the match condition specify the new names and values of the affected columns. Use `$$` to reference the existing value of the matched field. The left expression box defines the name and the right expression box defines the value. |
18 | 30 |
|
19 | 31 | 
|
20 | 32 |
|
21 |
| -## Column patterns |
22 |
| -Column patterns are useful for handling both Schema Drift scenarios as well as general scenarios. It is good for conditions where you are not able to fully know each column name. You can pattern match on column name and column data type and build an expression for transformation that will perform that operation against any field in the data stream that matches your `name` & `type` patterns. |
| 33 | +The above column pattern matches every column of type double and creates one aggregate column per match. The name of the new column is the matched column's name concatenated with '_total'. The value of the new column is the rounded, aggregated sum of the existing double value. |
| 34 | + |
| 35 | +To verify your matching condition is correct, you can validate the output schema of defined columns in the **Inspect** tab or get a snapshot of the data in the **Data preview** tab. |
| 36 | + |
| 37 | + |
23 | 38 |
|
24 |
| -When adding an expression to a transform that accepts patterns, choose "Add Column Pattern". Column Patterns allows schema drift column matching patterns. |
| 39 | +## Rule-based mapping in select and sink |
25 | 40 |
|
26 |
| -When building template column patterns, use `$$` in the expression to represent a reference to each matched field from the input data stream. |
| 41 | +When mapping columns in source and select transformations, you can add either fixed mapping or rule-based mappings. If you know the schema of your data and expect specific columns from the source dataset to always match specific static names, use fixed mapping. If you're working with flexible schemas, use rule-based mapping to build a pattern match based on the `name`, `type`, `stream`, and `position` of columns. You can have any combination of fixed and rule-based mappings. |
27 | 42 |
|
28 |
| -If you choose to use one of the Expression Builder regex functions, you can then subsequently use $1, $2, $3 ... to reference the subpatterns matched from your regex expression. |
| 43 | +To add a rule-based mapping, click **Add mapping** and select **Rule-based mapping**. |
29 | 44 |
|
30 |
| -An example of Column Pattern scenario is using SUM with a series of incoming fields. The aggregate SUM calculations are in the Aggregate transformation. You can then use SUM on every match of field types that match "integer" and then use $$ to reference each match in your expression. |
| 45 | + |
31 | 46 |
|
32 |
| -## Match columns |
33 |
| - |
| 47 | +In the left expression box, enter your boolean match condition. In the right expression box, specify what the matched column will be mapped to. Use `$$` to reference the existing name of the matched field. |
34 | 48 |
|
35 |
| -To build patterns based on columns, you can match on column name, type, stream, or position and use any combination of those with expression functions and regular expressions. |
| 49 | +If you click the downward chevron icon, you can specify a regex mapping condition. |
36 | 50 |
|
37 |
| - |
| 51 | +Click the eyeglasses icon next to a rule-based mapping to view which defined columns are matched and what they're mapped to. |
38 | 52 |
|
39 |
| -## Rule-based mapping |
40 |
| -When mapping columns in Source and Select transformations, you will have an option to choose "Fixed mapping" or "Rule-based mapping". When you know the schema of your data and expect specific columns from the Source dataset that always match specific static names, you can use fixed mapping. But when you are working with flexible schemas, use rule-based mapping. You will be able to build a pattern match using the rules described above. |
| 53 | + |
41 | 54 |
|
42 |
| - |
| 55 | +In the above example, two rule-based mappings are created. The first takes all columns not named 'movie' and maps them to their existing values. The second rule uses regex to match all columns that start with 'movie' and maps them to column 'movieId'. |
43 | 56 |
|
44 |
| -Build your rules using the expression builder. Your expressions will return a boolean value to either match columns (true) or exclude columns (false). |
| 57 | +If your rule results in multiple identical mappings, enable **Skip duplicate inputs** or **Skip duplicate outputs** to prevent duplicates. |
45 | 58 |
|
46 |
| -## Pattern matching special columns |
| 59 | +## Pattern matching expression values. |
47 | 60 |
|
48 |
| -* `$$` will translate to the name of each match at design time in debug mode and upon execution at run time |
| 61 | +* `$$` translates to the name or value of each match at run time |
49 | 62 | * `name` represents the name of each incoming column
|
50 | 63 | * `type` represents the data type of each incoming column
|
51 |
| -* `stream` represents the name associated with each stream or transformation in your flow |
| 64 | +* `stream` represents the name associated with each stream, or transformation in your flow |
52 | 65 | * `position` is the ordinal position of columns in your data flow
|
53 | 66 |
|
54 | 67 | ## Next steps
|
55 |
| -* Learn more about the ADF mapping data flow [expression language](https://aka.ms/dataflowexpressions) for data transformations |
56 |
| -* Use column patterns in the [Sink transformation](data-flow-sink.md) and [Select transformation](data-flow-select.md) with rule-based mapping |
| 68 | +* Learn more about the mapping data flow [expression language](data-flow-expression-functions.md) for data transformations |
| 69 | +* Use column patterns in the [sink transformation](data-flow-sink.md) and [select transformation](data-flow-select.md) with rule-based mapping |
0 commit comments