You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/data-factory/concepts-data-flow-performance.md
+6-2Lines changed: 6 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,7 +6,7 @@ ms.topic: conceptual
6
6
ms.author: makromer
7
7
ms.service: data-factory
8
8
ms.custom: seo-lt-2019
9
-
ms.date: 04/27/2020
9
+
ms.date: 05/21/2020
10
10
---
11
11
12
12
# Mapping data flows performance and tuning guide
@@ -36,7 +36,7 @@ While designing mapping data flows, you can unit test each transformation by cli
36
36
37
37
An Integration Runtime with more cores increases the number of nodes in the Spark compute environments and provides more processing power to read, write, and transform your data. ADF Data Flows utilizes Spark for the compute engine. The Spark environment works very well on memory-optimized resources.
38
38
* Try a **Compute Optimized** cluster if you want your processing rate to be higher than your input rate.
39
-
* Try a **Memory Optimized** cluster if you want to cache more data in memory. Memory optimized has a higher price-point per core than Compute Optimized, but will likely result in faster transformation speeds.
39
+
* Try a **Memory Optimized** cluster if you want to cache more data in memory. Memory optimized has a higher price-point per core than Compute Optimized, but will likely result in faster transformation speeds. If you experience out of memory errors when execution your data flows, switch to a memory optimized Azure IR configuration.
40
40
41
41

42
42
@@ -136,6 +136,10 @@ For example, if you have a list of data files from July 2019 that you wish to pr
136
136
137
137
By using wildcarding, your pipeline will only contain one Data Flow activity. This will perform better than a Lookup against the Blob Store that then iterates across all matched files using a ForEach with an Execute Data Flow activity inside.
138
138
139
+
The pipeline For Each in parallel mode will spawn multiple clusters by spinning-up job clusters for every executed data flow activity. This can cause Azure service throttling with high numbers of concurrent executions. However, use of Execute Data Flow inside of a For Each with Sequential set in the pipeline will avoid throttling and resource exhaustion. This will force Data Factory to execute each of your files against a data flow sequentially.
140
+
141
+
It is recommended that if you use For Each with a data flow in sequence, that you utilize the TTL setting in the Azure Integration Runtime. This is because each file will incur a full 5 minute cluster startup time inside of your iterator.
142
+
139
143
### Optimizing for CosmosDB
140
144
141
145
Setting throughput and batch properties on CosmosDB sinks only take effect during the execution of that data flow from a pipeline data flow activity. The original collection settings will be honored by CosmosDB after your data flow execution.
Copy file name to clipboardExpand all lines: articles/data-factory/data-flow-conditional-split.md
+3-1Lines changed: 3 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,7 +7,7 @@ ms.reviewer: daperlov
7
7
ms.service: data-factory
8
8
ms.topic: conceptual
9
9
ms.custom: seo-lt-2019
10
-
ms.date: 10/16/2019
10
+
ms.date: 05/21/2020
11
11
---
12
12
13
13
# Conditional split transformation in mapping data flow
@@ -16,6 +16,8 @@ ms.date: 10/16/2019
16
16
17
17
The conditional split transformation routes data rows to different streams based on matching conditions. The conditional split transformation is similar to a CASE decision structure in a programming language. The transformation evaluates expressions, and based on the results, directs the data row to the specified stream.
0 commit comments