MicrosoftDocs
diff --git a/‎articles/data-factory/concepts-data-flow-column-pattern.md
Lines changed: 19 additions & 7 deletions b/‎articles/data-factory/concepts-data-flow-column-pattern.md
Lines changed: 19 additions & 7 deletions
diff --git a/‎articles/data-factory/connector-azure-blob-storage.md
Lines changed: 1 addition & 1 deletion b/‎articles/data-factory/connector-azure-blob-storage.md
Lines changed: 1 addition & 1 deletion
diff --git a/‎articles/data-factory/connector-azure-data-lake-storage.md
Lines changed: 1 addition & 1 deletion b/‎articles/data-factory/connector-azure-data-lake-storage.md
Lines changed: 1 addition & 1 deletion
diff --git a/‎articles/data-factory/connector-azure-data-lake-store.md
Lines changed: 1 addition & 1 deletion b/‎articles/data-factory/connector-azure-data-lake-store.md
Lines changed: 1 addition & 1 deletion
diff --git a/‎articles/data-factory/data-flow-select.md
Lines changed: 94 additions & 28 deletions b/‎articles/data-factory/data-flow-select.md
Lines changed: 94 additions & 28 deletions
diff --git a/‎articles/data-factory/media/data-flow/fixedmapping.png
124 KB b/‎articles/data-factory/media/data-flow/fixedmapping.png
124 KB
diff --git a/‎articles/data-factory/media/data-flow/newselect1.png
-39.3 KB b/‎articles/data-factory/media/data-flow/newselect1.png
-39.3 KB
diff --git a/‎articles/data-factory/media/data-flow/regex-matching.png
70.6 KB b/‎articles/data-factory/media/data-flow/regex-matching.png
70.6 KB
diff --git a/‎articles/data-factory/media/data-flow/rule-based-hierarchy.png
93.7 KB b/‎articles/data-factory/media/data-flow/rule-based-hierarchy.png
93.7 KB
diff --git a/‎articles/data-factory/media/data-flow/rule-based-mapping.png
79.5 KB b/‎articles/data-factory/media/data-flow/rule-based-mapping.png
79.5 KB
@@ -38,23 +38,35 @@ To verify your matching condition is correct, you can validate the output schema
 
 ## Rule-based mapping in select and sink
 
-When mapping columns in source and select transformations, you can add either fixed mapping or rule-based mappings. If you know the schema of your data and expect specific columns from the source dataset to always match specific static names, use fixed mapping. If you're working with flexible schemas, use rule-based mapping to build a pattern match based on the `name`, `type`, `stream`, and `position` of columns. You can have any combination of fixed and rule-based mappings. 
+When mapping columns in source and select transformations, you can add either fixed mapping or rule-based mappings. Match based on the `name`, `type`, `stream`, and `position` of columns. You can have any combination of fixed and rule-based mappings. By default, all projections with greater than 50 columns will default to a rule-based mapping that matches on every column and outputs the inputted name. 
 
 To add a rule-based mapping, click **Add mapping** and select **Rule-based mapping**.
 
 ![rule-based mapping](media/data-flow/rule2.png "Rule-based mapping")
 
-In the left expression box, enter your boolean match condition. In the right expression box, specify what the matched column will be mapped to. Use `$$` to reference the existing name of the matched field.
+Each rule-based mapping requires two inputs: the condition on which to match by and what to name each mapped column. Both values are inputted via the [expression builder](concepts-data-flow-expression-builder.md). In the left expression box, enter your boolean match condition. In the right expression box, specify what the matched column will be mapped to.
 
-If you click the downward chevron icon, you can specify a regex mapping condition.
+![rule-based mapping](media/data-flow/rule-based-mapping.png "Rule-based mapping")
 
-Click the eyeglasses icon next to a rule-based mapping to view which defined columns are matched and what they're mapped to.
+Use `$$` syntax to reference the input name of a matched column. Using the above image as an example, say a user wants to match on all string columns whose names are shorter than six characters. If one incoming column was named `test`, the expression `$$ + '_short'` will rename the column `test_short`. If that's the only mapping that exists, all columns that don't meet the condition will be dropped from the outputted data.
 
-![rule-based mapping](media/data-flow/rule1.png "Rule-based mapping")
+Patterns match both drifted and defined columns. To see which defined columns are mapped by a rule, click the eyeglasses icon next to the rule. Verify your output using data preview.
 
-In the above example, two rule-based mappings are created. The first takes all columns not named 'movie' and maps them to their existing values. The second rule uses regex to match all columns that start with 'movie' and maps them to column 'movieId'.
+### Regex mapping
 
-If your rule results in multiple identical mappings, enable **Skip duplicate inputs** or **Skip duplicate outputs** to prevent duplicates.
+If you click the downward chevron icon, you can specify a regex-mapping condition. A regex-mapping condition matches all column names that match the specified regex condition. This can be used in combination with standard rule-based mappings.
+
+![rule-based mapping](media/data-flow/regex-matching.png "Rule-based mapping")
+
+The above example matches on regex pattern `(r)` or any column name that contains a lower case r. Similar to standard rule-based mapping, all matched columns are altered by the condition on the right using `$$` syntax.
+
+### Rule-based hierarchies
+
+If your defined projection has a hierarchy, you can use rule-based mapping to map the hierarchies subcolumns. Specify a matching condition and the complex column whose subcolumns you wish to map. Every matched subcolumn will be outputted using the 'Name as' rule specified on the right.
+
+![rule-based mapping](media/data-flow/rule-based-hierarchy.png "Rule-based mapping")
+
+The above example matches on all subcolumns of complex column `a`. `a` contains two subcolumns `b` and `c`. The output schema will include two columns `b` and `c` as the 'Name as' condition is `$$`.
 
 ## Pattern matching expression values.
 
 
@@ -562,7 +562,7 @@ In the sink transformation, you can write to either a container or folder in Azu
    * **Default**: Allow Spark to name files based on PART defaults.
    * **Pattern**: Enter a pattern that enumerates your output files per partition. For example, **loans[n].csv** will create loans1.csv, loans2.csv, and so on.
    * **Per partition**: Enter one file name per partition.
-   * **As data in column**: Set the output file to the value of a column. The path is relative to the dataset container, not the destination folder.
+   * **As data in column**: Set the output file to the value of a column. The path is relative to the dataset container, not the destination folder. If you have a folder path in your dataset, it will be overridden.
    * **Output to a single file**: Combine the partitioned output files into a single named file. The path is relative to the dataset folder. Please be aware that te merge operation can possibly fail based upon node size. This option is not recommended for large datasets.
 
 **Quote all:** Determines whether to enclose all values in quotes
 
@@ -457,7 +457,7 @@ In the sink transformation, you can write to either a container or folder in Azu
    * **Default**: Allow Spark to name files based on PART defaults.
    * **Pattern**: Enter a pattern that enumerates your output files per partition. For example, **loans[n].csv** will create loans1.csv, loans2.csv, and so on.
    * **Per partition**: Enter one file name per partition.
-   * **As data in column**: Set the output file to the value of a column. The path is relative to the dataset container, not the destination folder.
+   * **As data in column**: Set the output file to the value of a column. The path is relative to the dataset container, not the destination folder. If you have a folder path in your dataset, it will be overridden.
    * **Output to a single file**: Combine the partitioned output files into a single named file. The path is relative to the dataset folder. Please be aware that te merge operation can possibly fail based upon node size. This option is not recommended for large datasets.
 
 **Quote all:** Determines whether to enclose all values in quotes
 
@@ -400,7 +400,7 @@ In the sink transformation, you can write to either a container or folder in Azu
    * **Default**: Allow Spark to name files based on PART defaults.
    * **Pattern**: Enter a pattern that enumerates your output files per partition. For example, **loans[n].csv** will create loans1.csv, loans2.csv, and so on.
    * **Per partition**: Enter one file name per partition.
-   * **As data in column**: Set the output file to the value of a column. The path is relative to the dataset container, not the destination folder.
+   * **As data in column**: Set the output file to the value of a column. The path is relative to the dataset container, not the destination folder. If you have a folder path in your dataset, it will be overridden.
    * **Output to a single file**: Combine the partitioned output files into a single named file. The path is relative to the dataset folder. Please be aware that te merge operation can possibly fail based upon node size. This option is not recommended for large datasets.
 
 **Quote all:** Determines whether to enclose all values in quotes
 
@@ -6,55 +6,121 @@ ms.author: makromer
 ms.service: data-factory
 ms.topic: conceptual
 ms.custom: seo-lt-2019
-ms.date: 03/08/2020
+ms.date: 03/18/2020
 ---
 
-# Mapping data flow select transformation
+# Select transformation in mapping data flow
 
+Use the select transformation to rename, drop, or reorder columns. This transformation doesn't alter row data, but chooses which columns are propagated downstream. This process is called 
 
-Use this transformation for column selectivity (reducing number of columns), alias columns and stream names, and reorder columns.
+In a select transformation, users can specify fixed mappings, use patterns to do rule-based mapping, or enable auto mapping. Fixed and rule-based mappings can both be used within the same select transformation. If a column doesn't match one of the defined mappings, it will be dropped.
 
-## How to use Select Transformation
-The Select transform allows you to alias an entire stream, or columns in that stream, assign different names (aliases) and then reference those new names later in your data flow. This transform is useful for self-join scenarios. The way to implement a self-join in ADF Data Flow is to take a stream, branch it with "New Branch", then immediately afterward, add a "Select" transform. That stream will now have a new name that you can use to join back to the original stream, creating a self-join:
+## Fixed mapping
 
-![Self-join](media/data-flow/selfjoin.png "Self-join")
+If there are fewer than 50 columns defined in your projection, all defined columns will have a fixed mapping by default. A fixed mapping takes a defined, incoming column and maps it an exact name.
 
-In the above diagram, the Select transform is at the top. This is aliasing the original stream to "OrigSourceBatting". In the highlighted Join transform below it, you can see that we use this Select alias stream as the right-hand join, allowing us to reference the same key in both the Left & Right side of the Inner Join.
+![Fixed mapping](media/data-flow/fixedmapping.png "Fixed mapping")
 
-Select can also be used as a way de-select columns from your data flow. For example, if you have 6 columns defined in your sink, but you only wish to pick a specific 3 to transform and then flow to the sink, you can select just those 3 by using the select transform.
+> [!NOTE]
+> You can't map or rename a drifted column using a fixed mapping
 
-![Select Transformation](media/data-flow/newselect1.png "Select Alias")
+### Mapping hierarchical columns
 
-## Options
-* The default setting for "Select" is to include all incoming columns and keep those original names. You can alias the stream by setting the name of the Select transform.
-* To alias individual columns, deselect "Select All" and use the column mapping at the bottom.
-* Choose Skip Duplicates to eliminate duplicate columns from Input or Output metadata.
+Fixed mappings can be used to map a subcolumn of a hierarchical column to a top-level column. If you have a defined hierarchy, use the column dropdown to select a subcolumn. The select transformation will create a new column with the value and data type of the subcolumn.
 
-![Skip Duplicates](media/data-flow/select-skip-dup.png "Skip Duplicates")
+![hierarchical mapping](media/data-flow/select-hierarchy.png "hierarchical mapping")
+
+## Rule-based mapping
 
-* When you choose to skip duplicates, the results will be visible in the Inspect tab. ADF will keep the first occurrence of the column and you'll see that each subsequent occurrence of that same column has been removed from your flow.
+If you wish to map many columns at once or pass drifted columns downstream, use rule-based mapping to define your mappings using column patterns. Match based on the `name`, `type`, `stream`, and `position` of columns. You can have any combination of fixed and rule-based mappings. By default, all projections with greater than 50 columns will default to a rule-based mapping that matches on every column and outputs the inputted name. 
 
-> [!NOTE]
-> To clear mapping rules, press the **Reset** button.
+To add a rule-based mapping, click **Add mapping** and select **Rule-based mapping**.
 
-## Mapping
-By default, the Select transformation will automatically map all columns, which will pass through all incoming columns to the same name on the output. The output stream name that is set in Select Settings will define a new alias name for the stream. If you keep the Select set for auto-map, then you can alias the entire stream with all columns the same.
+![rule-based mapping](media/data-flow/rule2.png "Rule-based mapping")
 
-![Select Transformation rules](media/data-flow/rule2.png "Rule-based mapping")
+Each rule-based mapping requires two inputs: the condition on which to match by and what to name each mapped column. Both values are inputted via the [expression builder](concepts-data-flow-expression-builder.md). In the left expression box, enter your boolean match condition. In the right expression box, specify what the matched column will be mapped to.
 
-If you wish to alias, remove, rename, or re-order columns, you must first switch off "auto-map". By default, you will see a default rule entered for you called "All input columns". You can leave this rule in place if you intend to always allow all incoming columns to map to the same name on their output.
+![rule-based mapping](media/data-flow/rule-based-mapping.png "Rule-based mapping")
 
-However, if you wish to add custom rules, then you will click "Add mapping". Field mapping will provide you with a list of incoming and outgoing column names to map and alias. Choose "rule-based mapping" to create pattern matching rules.
+Use `$$` syntax to reference the input name of a matched column. Using the above image as an example, say a user wants to match on all string columns whose names are shorter than six characters. If one incoming column was named `test`, the expression `$$ + '_short'` will rename the column `test_short`. If that's the only mapping that exists, all columns that don't meet the condition will be dropped from the outputted data.
 
-## Rule-based mapping
-When you choose rule-based mapping, you are instructing ADF to evaluate your matching expression to match incoming pattern rules and define the outgoing field names. You may add any combination of both field and rule-based mappings. Field names are then generated at runtime by ADF based on incoming metadata from the source. You can view the names of the generated fields during debug and using the data preview pane.
+Patterns match both drifted and defined columns. To see which defined columns are mapped by a rule, click the eyeglasses icon next to the rule. Verify your output using data preview.
+
+### Regex mapping
+
+If you click the downward chevron icon, you can specify a regex-mapping condition. A regex-mapping condition matches all column names that match the specified regex condition. This can be used in combination with standard rule-based mappings.
+
+![rule-based mapping](media/data-flow/regex-matching.png "Rule-based mapping")
+
+The above example matches on regex pattern `(r)` or any column name that contains a lower case r. Similar to standard rule-based mapping, all matched columns are altered by the condition on the right using `$$` syntax.
+
+If you have multiple regex matches in your column name, you can refer to specific matches using `$n` where 'n' refers to which match. For example, '$2' refers to the second match within a column name.
+
+### Rule-based hierarchies
 
-More details on pattern matching is available at the [Column Pattern documentation](concepts-data-flow-column-pattern.md).
+If your defined projection has a hierarchy, you can use rule-based mapping to map the hierarchies subcolumns. Specify a matching condition and the complex column whose subcolumns you wish to map. Every matched subcolumn will be outputted using the 'Name as' rule specified on the right.
 
-### Use rule-based mapping to parameterize the Select transformation
-You can parameterize field mapping in the Select transformation by using rule-based mapping. Use the keyword ```name``` to check the incoming column names against a parameter. For example, if you have a data flow parameter called ```mycolumn``` you can create a single Select transformation rule that always maps whatever column name you set ```mycolumn``` to a field name this way:
+![rule-based mapping](media/data-flow/rule-based-hierarchy.png "Rule-based mapping")
+
+The above example matches on all subcolumns of complex column `a`. `a` contains two subcolumns `b` and `c`. The output schema will include two columns `b` and `c` as the 'Name as' condition is `$$`.
+
+### Parameterization
+
+You can parameterize column names using rule-based mapping. Use the keyword ```name``` to match incoming column names against a parameter. For example, if you have a data flow parameter ```mycolumn```, you can create a rule that matches any column name that is equal to ```mycolumn```. You can rename the matched column to a hard-coded string such as 'business key' and reference it explicitly. In this example, the matching condition is ```name == $mycolumn``` and the name condition is 'business key'. 
+
+## Auto mapping
+
+When adding a select transformation, **Auto mapping** can be enabled by switching the Auto mapping slider. With auto mapping, the select transformation maps all incoming columns, excluding duplicates, with the same name as their input. This will include drifted columns, which means the output data may contain columns not defined in your schema. For more information on drifted columns, see [schema drift](concepts-data-flow-schema-drift.md).
+
+![Auto mapping](media/data-flow/automap.png "Auto mapping")
+
+With auto mapping on, the select transformation will honor the skip duplicate settings and provide a new alias for the existing columns. Aliasing is useful when doing multiple joins or lookups on the same stream and in self-join scenarios. 
+
+## Duplicate columns
+
+By default, the select transformation drops duplicate columns in both the input and output projection. Duplicate input columns often come from join and lookup transformations where column names are duplicated on each side of the join. Duplicate output columns can occur if you map two different input columns to the same name. Choose whether to drop or pass on duplicate columns by toggling the checkbox.
+
+![Skip Duplicates](media/data-flow/select-skip-dup.png "Skip Duplicates")
 
-```name == $mycolumn```
+## Ordering of columns
+
+The order of mappings determines the order of the output columns. If an input column is mapped multiple times, only the first mapping will be honored. For any duplicate column dropping, the first match will be kept.
+
+## Data flow script
+
+### Syntax
+
+```
+<incomingStream>
+    select(mapColumn(
+        each(<hierarchicalColumn>, match(<matchCondition>), <nameCondition> = $$), ## hierarchical rule-based matching
+        <fixedColumn>, ## fixed mapping, no rename
+        <renamedFixedColumn> = <fixedColumn>, ## fixed mapping, rename
+        each(match(<matchCondition>), <nameCondition> = $$), ## rule-based mapping
+        each(patternMatch(<regexMatching>), <nameCondition> = $$) ## regex mapping
+    ),
+    skipDuplicateMapInputs: { true | false },
+    skipDuplicateMapOutputs: { true | false }) ~> <selectTransformationName>
+```
+
+### Example
+
+Below is an example of a select mapping and its data flow script:
+
+![Select script example](media/data-flow/select-script-example.png "Select script example")
+
+```
+DerivedColumn1 select(mapColumn(
+        each(a, match(true())),
+        movie,
+        title1 = title,
+        each(match(name == 'Rating')),
+        each(patternMatch(`(y)`),
+            $1 + 'regex' = $$)
+    ),
+    skipDuplicateMapInputs: true,
+    skipDuplicateMapOutputs: true) ~> Select1
+```
 
 ## Next steps
 * After using Select to rename, reorder, and alias columns, use the [Sink transformation](data-flow-sink.md) to land your data into a data store.