Skip to content

Commit e966184

Browse files
authored
Merge pull request #79027 from kromerm/dataflow-1
Dataflow 1
2 parents 3e3bf4c + 2715eee commit e966184

12 files changed

+77
-9
lines changed

articles/data-factory/TOC.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -486,6 +486,8 @@
486486
href: control-flow-expression-language-functions.md
487487
- name: System variables
488488
href: control-flow-system-variables.md
489+
- name: Data flow parameters
490+
href: parameters-data-flow.md
489491
- name: Security
490492
items:
491493
- name: Data movement security considerations

articles/data-factory/concepts-data-flow-optimize-tab.md

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,14 @@
11
---
22
title: Azure Data Factory Mapping Data Flow Optimize Tab
3-
description: Optimize Azure Data Factory Mapping Data Flows with Optimize Tab Partition Settings
3+
description: Optimize Azure Data Factory Mapping Data Flows using the Optimize Tab with Partition Settings
44
author: kromerm
55
ms.author: makromer
6-
ms.reviewer: douglasl
76
ms.service: data-factory
87
ms.topic: conceptual
98
ms.date: 01/31/2019
109
---
1110

12-
# Mapping Data Flow Transformation Optimize Tab
11+
# Mapping data flow transformation optimize tab
1312

1413
[!INCLUDE [notes](../../includes/data-factory-data-flow-preview.md)]
1514

@@ -46,3 +45,8 @@ You must build an expression that provides a fixed range for values within your
4645
### Key
4746

4847
If you have a good understanding of the cardinality of your data, key partitioning may be a good partition strategy. Key partitioning will create partitions for each unique value in your column. You cannot set the number of partitions because the number will be based on unique values in the data.
48+
49+
## Next steps
50+
51+
[Mapping data flow performance guide](concepts-data-flow-performance.md)
52+
[Data flow monitoring](concepts-data-flow-monitoring.md)

articles/data-factory/concepts-data-flow-performance.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@ Clicking that icon will display the execution plan and subsequent performance pr
3636

3737
## Optimizing for Azure SQL Database and Azure SQL Data Warehouse
3838

39-
![Source Part](media/data-flow/sourcepart2.png "Source Part")
39+
![Source Part](media/data-flow/sourcepart3.png "Source Part")
4040

4141
### Partition your source data
4242

@@ -117,8 +117,8 @@ Clicking that icon will display the execution plan and subsequent performance pr
117117
* To avoid exhausting compute node resources, you can keep the default or explicit partitioning scheme in ADF, which optimizes for performance, and then add a subsequent Copy Activity in the pipeline that merges all of the PART files from the output folder to a new single file. Essentially, this technique separates the action of transformation from file merging and achieves the same result as setting "output to single file".
118118

119119
## Next steps
120-
See the other Data Flow articles:
120+
See the other Data Flow articles related to performance:
121121

122-
- [Data Flow overview](concepts-data-flow-overview.md)
122+
- [Data Flow Optimize Tab](concepts-data-flow-optimize-tab.md)
123123
- [Data Flow activity](control-flow-execute-data-flow-activity.md)
124124
- [Monitor Data Flow performance](concepts-data-flow-monitoring.md)

articles/data-factory/control-flow-execute-data-flow-activity.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -69,7 +69,7 @@ You have control over the Spark execution environment for your Data Flow activit
6969

7070
### Staging area
7171

72-
If you are sinking your data into Azure Data Warehouse, you must choose a staging location for your Polybase batch load.
72+
If you are sinking your data into Azure Data Warehouse, you must choose a staging location for your Polybase batch load. The staging settings are only applicable to Azure Data Warehouse workloads.
7373

7474
## Parameterized datasets
7575

articles/data-factory/data-flow-source.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -60,13 +60,13 @@ You can later change the column names in a select transformation. Use a derived-
6060

6161
On the **Optimize** tab for the source transformation, you might see a **Source** partition type. This option is available only when your source is Azure SQL Database. This is because Data Factory tries to make connections parallel to run large queries against your SQL Database source.
6262

63-
![Source partition settings](media/data-flow/sourcepart2.png "partitioning")
63+
![Source partition settings](media/data-flow/sourcepart3.png "partitioning")
6464

6565
You don't have to partition data on your SQL Database source, but partitions are useful for large queries. You can base your partition on a column or a query.
6666

6767
### Use a column to partition data
6868

69-
From your source table, select a column to partition on. Also set the maximum number of connections.
69+
From your source table, select a column to partition on. Also set the number of partitions.
7070

7171
### Use a query to partition data
7272

2.93 KB
Loading
4.03 KB
Loading
9.28 KB
Loading
5.76 KB
Loading
4.66 KB
Loading

0 commit comments

Comments
 (0)