You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/data-factory/concepts-data-flow-performance.md
+12-1Lines changed: 12 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,7 +5,7 @@ author: kromerm
5
5
ms.topic: conceptual
6
6
ms.author: makromer
7
7
ms.service: data-factory
8
-
ms.date: 05/16/2019
8
+
ms.date: 09/22/2019
9
9
---
10
10
11
11
# Mapping data flows performance and tuning guide
@@ -85,6 +85,13 @@ Clicking that icon will display the execution plan and subsequent performance pr
85
85
* Inside of the Data Flow designer, use the Data Preview tab on transformations to view the results of your transformation logic.
86
86
* Unit test your data flows from the pipeline designer by placing a Data Flow activity on the pipeline design canvas and use the "Debug" button to test.
87
87
* Testing in debug mode will work against a live warmed cluster environment without the need to wait for a just-in-time cluster spin-up.
88
+
* During Data Preview debugging inside of the Data Flow designer experience, you can limit the amount of data that you test with for each source by setting the row limit from the Debug Settings link on the Data Flow designer UI. Please note that you must turn on Debug Mode first.
* When testing your data flows from a pipeline debug execution, you can limit the number of rows used for testing by setting the sampling size on each of your sources. Be sure to disable sampling when scheduling your pipelines on a regular operationalized schedule.
* Use an ADF pipeline stored procedure activity prior to your Data Flow activity that disables indexes on your target tables that are being written to from your Sink.
@@ -136,6 +143,10 @@ For example, if I have a list of data files from July 2019 that I wish to proces
136
143
137
144
This will perform better than a Lookup against the Blob Store in a pipeline that then iterates across all matched files using a ForEach with an Execute Data Flow activity inside.
138
145
146
+
### Increase the size of your debug cluster
147
+
148
+
By default, turning on debug will use the default Azure Integration runtime that is created automatically for each data factory. This default Azure IR is set for 8 cores, 4 for a driver node and 4 for a worker node, using General Compute properties. As you test with larger data, you can increase the size of your debug cluster by creating a new Azure IR with larger configurations and choose this new Azure IR when you switch on debug. This will instruct ADF to use this Azure IR for data preview and pipeline debug with data flows.
149
+
139
150
## Next steps
140
151
141
152
See the other Data Flow articles related to performance:
0 commit comments