You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/data-factory/concepts-data-flow-performance.md
+7-3Lines changed: 7 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,7 +6,7 @@ ms.topic: conceptual
6
6
ms.author: makromer
7
7
ms.service: data-factory
8
8
ms.custom: seo-lt-2019
9
-
ms.date: 03/11/2020
9
+
ms.date: 04/14/2020
10
10
---
11
11
12
12
# Mapping data flows performance and tuning guide
@@ -32,7 +32,7 @@ While designing mapping data flows, you can unit test each transformation by cli
32
32
33
33
## Increasing compute size in Azure Integration Runtime
34
34
35
-
An Integration Runtime with more cores increases the number of nodes in the Spark compute environments and provides more processing power to read, write, and transform your data.
35
+
An Integration Runtime with more cores increases the number of nodes in the Spark compute environments and provides more processing power to read, write, and transform your data. ADF Data Flows utilizes Spark for the compute engine. The Spark environment works very well on memory-optimized resources.
36
36
* Try a **Compute Optimized** cluster if you want your processing rate to be higher than your input rate.
37
37
* Try a **Memory Optimized** cluster if you want to cache more data in memory. Memory optimized has a higher price-point per core than Compute Optimized, but will likely result in faster transformation speeds.
38
38
@@ -44,7 +44,11 @@ For more information how to create an Integration Runtime, see [Integration Runt
44
44
45
45
By default, turning on debug will use the default Azure Integration runtime that is created automatically for each data factory. This default Azure IR is set for eight cores, four for a driver node and four for a worker node, using General Compute properties. As you test with larger data, you can increase the size of your debug cluster by creating an Azure IR with larger configurations and choose this new Azure IR when you switch on debug. This will instruct ADF to use this Azure IR for data preview and pipeline debug with data flows.
46
46
47
-
## Optimizing for Azure SQL Database and Azure SQL Data Warehouse
47
+
### Decrease cluster compute start-up time with TTL
48
+
49
+
There is a property in the Azure IR under Data Flow Properties that will allow you to stand-up a pool of cluster compute resources for your factory. With this pool, you can sequentially submit data flow activities for execution. Once the pool is established, each subsequent job will take 1-2 minutes for the on-demand Spark cluster to execute your job. The initial set-up of the resource pool will take around 6 minutes. Specify the amount of time that you wish to maintain the resource pool in the time-to-live (TTL) setting.
50
+
51
+
## Optimizing for Azure SQL Database and Azure SQL Data Warehouse Synapse
0 commit comments