Skip to content

Commit 6ce11a3

Browse files
authored
Merge pull request #89366 from kromerm/dataflow-1
Update concepts-data-flow-performance.md
2 parents fbde14b + 0ef84d6 commit 6ce11a3

File tree

1 file changed

+12
-1
lines changed

1 file changed

+12
-1
lines changed

articles/data-factory/concepts-data-flow-performance.md

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ author: kromerm
55
ms.topic: conceptual
66
ms.author: makromer
77
ms.service: data-factory
8-
ms.date: 05/16/2019
8+
ms.date: 09/22/2019
99
---
1010

1111
# Mapping data flows performance and tuning guide
@@ -85,6 +85,13 @@ Clicking that icon will display the execution plan and subsequent performance pr
8585
* Inside of the Data Flow designer, use the Data Preview tab on transformations to view the results of your transformation logic.
8686
* Unit test your data flows from the pipeline designer by placing a Data Flow activity on the pipeline design canvas and use the "Debug" button to test.
8787
* Testing in debug mode will work against a live warmed cluster environment without the need to wait for a just-in-time cluster spin-up.
88+
* During Data Preview debugging inside of the Data Flow designer experience, you can limit the amount of data that you test with for each source by setting the row limit from the Debug Settings link on the Data Flow designer UI. Please note that you must turn on Debug Mode first.
89+
90+
![Debug Settings](media/data-flow/debug-settings.png "Debug Settings")
91+
92+
* When testing your data flows from a pipeline debug execution, you can limit the number of rows used for testing by setting the sampling size on each of your sources. Be sure to disable sampling when scheduling your pipelines on a regular operationalized schedule.
93+
94+
![Row Sampling](media/data-flow/source1.png "Row Sampling")
8895

8996
### Disable indexes on write
9097
* Use an ADF pipeline stored procedure activity prior to your Data Flow activity that disables indexes on your target tables that are being written to from your Sink.
@@ -136,6 +143,10 @@ For example, if I have a list of data files from July 2019 that I wish to proces
136143

137144
This will perform better than a Lookup against the Blob Store in a pipeline that then iterates across all matched files using a ForEach with an Execute Data Flow activity inside.
138145

146+
### Increase the size of your debug cluster
147+
148+
By default, turning on debug will use the default Azure Integration runtime that is created automatically for each data factory. This default Azure IR is set for 8 cores, 4 for a driver node and 4 for a worker node, using General Compute properties. As you test with larger data, you can increase the size of your debug cluster by creating a new Azure IR with larger configurations and choose this new Azure IR when you switch on debug. This will instruct ADF to use this Azure IR for data preview and pipeline debug with data flows.
149+
139150
## Next steps
140151

141152
See the other Data Flow articles related to performance:

0 commit comments

Comments
 (0)