|
1 | 1 | ---
|
2 |
| -title: Mapping data flow Sort Transformation |
| 2 | +title: Sort transformation in mapping data flow |
3 | 3 | description: Azure Data Factory Mapping Data Sort Transformation
|
4 | 4 | author: kromerm
|
5 | 5 | ms.author: makromer
|
6 |
| -ms.reviewer: douglasl |
| 6 | +ms.reviewer: daperlov |
7 | 7 | ms.service: data-factory
|
8 | 8 | ms.topic: conceptual
|
9 | 9 | ms.custom: seo-lt-2019
|
10 |
| -ms.date: 10/08/2018 |
| 10 | +ms.date: 04/14/2020 |
11 | 11 | ---
|
12 | 12 |
|
13 |
| -# Azure Data Factory Data Flow Sort Transformations |
| 13 | +# Sort transformation in mapping data flow |
14 | 14 |
|
| 15 | +The sort transformation allows you to sort the incoming rows on the current data stream. You can choose individual columns and sort them in ascending or descending order. |
15 | 16 |
|
| 17 | +> [!NOTE] |
| 18 | +> Mapping data flows are executed on spark clusters which distribute data across multiple nodes and partitions. If you choose to repartition your data in a subsequent transformation, you may lose your sorting due to reshuffling of data. |
| 19 | +
|
| 20 | +## Configuration |
16 | 21 |
|
17 | 22 | 
|
18 | 23 |
|
19 |
| -The Sort transformation allows you to sort the incoming rows on the current data stream. The outgoing rows from the Sort Transformation will subsequently follow the ordering rules that you set. You can choose individual columns and sort them ASC or DEC, using the arrow indicator next to each field. If you need to modify the column before applying the sort, click on "Computed Columns" to launch the expression editor. This will provide with an opportunity to build an expression for the sort operation instead of simply applying a column for the sort. |
| 24 | +**Case insensitive:** Whether or not you wish to ignore case when sorting string or text fields |
| 25 | + |
| 26 | +**Sort Only Within Partitions:** As data flows are run on spark, each data stream is divided into partitions. This setting sorts data only within the incoming partitions rather than sorting the entire data stream. |
| 27 | + |
| 28 | +**Sort conditions:** Choose which columns you are sorting by and in which order the sort happens. The order determines sorting priority. Choose whether or not nulls will appear at the beginning or end of the data stream. |
| 29 | + |
| 30 | +### Computed columns |
20 | 31 |
|
21 |
| -## Case insensitive |
22 |
| -You can turn on "Case insensitive" if you wish to ignore case when sorting string or text fields. |
| 32 | +To modify or extract a column value before applying the sort, hover over the column and select "computed column". This will open the expression builder to create an expression for the sort operation instead of using a column value. |
23 | 33 |
|
24 |
| -"Sort Only Within Partitions" leverages Spark data partitioning. By sorting incoming data only within each partition, Data Flows can sort partitioned data instead of sorting entire data stream. |
| 34 | +## Data flow script |
25 | 35 |
|
26 |
| -Each of the sort conditions in the Sort Transformation can be rearranged. So if you need to move a column higher in the sort precedence, grab that row with your mouse and move it higher or lower in the sorting list. |
| 36 | +### Syntax |
| 37 | + |
| 38 | +``` |
| 39 | +<incomingStream> |
| 40 | + sort( |
| 41 | + desc(<sortColumn1>, { true | false }), |
| 42 | + asc(<sortColumn2>, { true | false }), |
| 43 | + ... |
| 44 | + ) ~> <sortTransformationName<> |
| 45 | +``` |
| 46 | + |
| 47 | +### Example |
| 48 | + |
| 49 | + |
27 | 50 |
|
28 |
| -Partitioning effects on Sort |
| 51 | +The data flow script for the above sort configuration is in the code snippet below. |
29 | 52 |
|
30 |
| -ADF Data Flow is executed on big data Spark clusters with data distributed across multiple nodes and partitions. It is important to keep this in mind when architecting your data flow if you are depending on the Sort transform to keep data in that same order. If you choose to repartition your data in a subsequent transformation, you may lose your sorting due to that reshuffling of data. |
| 53 | +``` |
| 54 | +BasketballStats sort(desc(PTS, true), |
| 55 | + asc(Age, true)) ~> Sort1 |
| 56 | +``` |
31 | 57 |
|
32 | 58 | ## Next steps
|
33 | 59 |
|
|
0 commit comments