Skip to content

Commit d11ccde

Browse files
authored
Merge pull request #111434 from djpmsft/docUpdates
fixing sort transformation
2 parents 83c11df + b0fdc27 commit d11ccde

File tree

2 files changed

+37
-11
lines changed

2 files changed

+37
-11
lines changed

articles/data-factory/data-flow-sort.md

Lines changed: 37 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,33 +1,59 @@
11
---
2-
title: Mapping data flow Sort Transformation
2+
title: Sort transformation in mapping data flow
33
description: Azure Data Factory Mapping Data Sort Transformation
44
author: kromerm
55
ms.author: makromer
6-
ms.reviewer: douglasl
6+
ms.reviewer: daperlov
77
ms.service: data-factory
88
ms.topic: conceptual
99
ms.custom: seo-lt-2019
10-
ms.date: 10/08/2018
10+
ms.date: 04/14/2020
1111
---
1212

13-
# Azure Data Factory Data Flow Sort Transformations
13+
# Sort transformation in mapping data flow
1414

15+
The sort transformation allows you to sort the incoming rows on the current data stream. You can choose individual columns and sort them in ascending or descending order.
1516

17+
> [!NOTE]
18+
> Mapping data flows are executed on spark clusters which distribute data across multiple nodes and partitions. If you choose to repartition your data in a subsequent transformation, you may lose your sorting due to reshuffling of data.
19+
20+
## Configuration
1621

1722
![Sort settings](media/data-flow/sort.png "Sort")
1823

19-
The Sort transformation allows you to sort the incoming rows on the current data stream. The outgoing rows from the Sort Transformation will subsequently follow the ordering rules that you set. You can choose individual columns and sort them ASC or DEC, using the arrow indicator next to each field. If you need to modify the column before applying the sort, click on "Computed Columns" to launch the expression editor. This will provide with an opportunity to build an expression for the sort operation instead of simply applying a column for the sort.
24+
**Case insensitive:** Whether or not you wish to ignore case when sorting string or text fields
25+
26+
**Sort Only Within Partitions:** As data flows are run on spark, each data stream is divided into partitions. This setting sorts data only within the incoming partitions rather than sorting the entire data stream.
27+
28+
**Sort conditions:** Choose which columns you are sorting by and in which order the sort happens. The order determines sorting priority. Choose whether or not nulls will appear at the beginning or end of the data stream.
29+
30+
### Computed columns
2031

21-
## Case insensitive
22-
You can turn on "Case insensitive" if you wish to ignore case when sorting string or text fields.
32+
To modify or extract a column value before applying the sort, hover over the column and select "computed column". This will open the expression builder to create an expression for the sort operation instead of using a column value.
2333

24-
"Sort Only Within Partitions" leverages Spark data partitioning. By sorting incoming data only within each partition, Data Flows can sort partitioned data instead of sorting entire data stream.
34+
## Data flow script
2535

26-
Each of the sort conditions in the Sort Transformation can be rearranged. So if you need to move a column higher in the sort precedence, grab that row with your mouse and move it higher or lower in the sorting list.
36+
### Syntax
37+
38+
```
39+
<incomingStream>
40+
sort(
41+
desc(<sortColumn1>, { true | false }),
42+
asc(<sortColumn2>, { true | false }),
43+
...
44+
) ~> <sortTransformationName<>
45+
```
46+
47+
### Example
48+
49+
![Sort settings](media/data-flow/sort.png "Sort")
2750

28-
Partitioning effects on Sort
51+
The data flow script for the above sort configuration is in the code snippet below.
2952

30-
ADF Data Flow is executed on big data Spark clusters with data distributed across multiple nodes and partitions. It is important to keep this in mind when architecting your data flow if you are depending on the Sort transform to keep data in that same order. If you choose to repartition your data in a subsequent transformation, you may lose your sorting due to that reshuffling of data.
53+
```
54+
BasketballStats sort(desc(PTS, true),
55+
asc(Age, true)) ~> Sort1
56+
```
3157

3258
## Next steps
3359

34.8 KB
Loading

0 commit comments

Comments
 (0)