Skip to content

Commit 8081ced

Browse files
authored
Update concepts-data-flow-performance.md
1 parent e64ca32 commit 8081ced

File tree

1 file changed

+7
-3
lines changed

1 file changed

+7
-3
lines changed

articles/data-factory/concepts-data-flow-performance.md

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ ms.topic: conceptual
66
ms.author: makromer
77
ms.service: data-factory
88
ms.custom: seo-lt-2019
9-
ms.date: 03/11/2020
9+
ms.date: 04/14/2020
1010
---
1111

1212
# Mapping data flows performance and tuning guide
@@ -32,7 +32,7 @@ While designing mapping data flows, you can unit test each transformation by cli
3232

3333
## Increasing compute size in Azure Integration Runtime
3434

35-
An Integration Runtime with more cores increases the number of nodes in the Spark compute environments and provides more processing power to read, write, and transform your data.
35+
An Integration Runtime with more cores increases the number of nodes in the Spark compute environments and provides more processing power to read, write, and transform your data. ADF Data Flows utilizes Spark for the compute engine. The Spark environment works very well on memory-optimized resources.
3636
* Try a **Compute Optimized** cluster if you want your processing rate to be higher than your input rate.
3737
* Try a **Memory Optimized** cluster if you want to cache more data in memory. Memory optimized has a higher price-point per core than Compute Optimized, but will likely result in faster transformation speeds.
3838

@@ -44,7 +44,11 @@ For more information how to create an Integration Runtime, see [Integration Runt
4444

4545
By default, turning on debug will use the default Azure Integration runtime that is created automatically for each data factory. This default Azure IR is set for eight cores, four for a driver node and four for a worker node, using General Compute properties. As you test with larger data, you can increase the size of your debug cluster by creating an Azure IR with larger configurations and choose this new Azure IR when you switch on debug. This will instruct ADF to use this Azure IR for data preview and pipeline debug with data flows.
4646

47-
## Optimizing for Azure SQL Database and Azure SQL Data Warehouse
47+
### Decrease cluster compute start-up time with TTL
48+
49+
There is a property in the Azure IR under Data Flow Properties that will allow you to stand-up a pool of cluster compute resources for your factory. With this pool, you can sequentially submit data flow activities for execution. Once the pool is established, each subsequent job will take 1-2 minutes for the on-demand Spark cluster to execute your job. The initial set-up of the resource pool will take around 6 minutes. Specify the amount of time that you wish to maintain the resource pool in the time-to-live (TTL) setting.
50+
51+
## Optimizing for Azure SQL Database and Azure SQL Data Warehouse Synapse
4852

4953
### Partitioning on source
5054

0 commit comments

Comments
 (0)