You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/data-factory/concepts-data-flow-debug-mode.md
+5-5Lines changed: 5 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,9 +5,9 @@ description: Start an interactive debug session when building data flows with Az
5
5
ms.author: makromer
6
6
author: kromerm
7
7
ms.subservice: data-flows
8
-
ms.topic: conceptual
8
+
ms.topic: concept-article
9
9
ms.custom: synapse
10
-
ms.date: 10/20/2023
10
+
ms.date: 03/31/2025
11
11
---
12
12
13
13
# Mapping data flow Debug Mode
@@ -39,7 +39,7 @@ In most cases, it's a good practice to build your Data Flows in debug mode so th
39
39
---
40
40
41
41
> [!NOTE]
42
-
> Every debug session that a user starts from their browser UI is a new session with its own Spark cluster. You can use the monitoring view for debug sessions shown in the previous images to view and manage debug sessions. You are charged for every hour that each debug session is executing including the TTL time.
42
+
> Every debug session that a user starts from their browser UI is a new session with its own Spark cluster. You can use the monitoring view for debug sessions shown in the previous images to view and manage debug sessions. You're charged for every hour that each debug session is executing including the TTL time.
43
43
44
44
This video clip talks about tips, tricks, and good practices for data flow debug mode.
@@ -73,9 +73,9 @@ With debug on, the Data Preview tab lights up on the bottom panel. Without debug
73
73
You can sort columns in data preview and rearrange columns using drag and drop. Additionally, there's an export button on the top of the data preview panel that you can use to export the preview data to a CSV file for offline data exploration. You can use this feature to export up to 1,000 rows of preview data.
74
74
75
75
> [!NOTE]
76
-
> File sources only limit the rows that you see, not the rows being read. For very large datasets, it is recommended that you take a small portion of that file and use it for your testing. You can select a temporary file in Debug Settings for each source that is a file dataset type.
76
+
> File sources only limit the rows that you see, not the rows being read. For very large datasets, it's recommended that you take a small portion of that file and use it for your testing. You can select a temporary file in Debug Settings for each source that is a file dataset type.
77
77
78
-
When running in Debug Mode in Data Flow, your data won't be written to the Sink transform. A Debug session is intended to serve as a test harness for your transformations. Sinks aren't required during debug and are ignored in your data flow. If you wish to test writing the data in your Sink, execute the Data Flow from a pipeline and use the Debug execution from a pipeline.
78
+
When running Data Flow in Debug Mode, your data won't be written to the Sink transform. A Debug session is intended to serve as a test harness for your transformations. Sinks aren't required during debug and are ignored in your data flow. If you wish to test writing the data in your Sink, execute the Data Flow from a pipeline and use the Debug execution from a pipeline.
79
79
80
80
Data Preview is a snapshot of your transformed data using row limits and data sampling from data frames in Spark memory. Therefore, the sink drivers aren't utilized or tested in this scenario.
description: Learn about how to optimize and improve performance of the Azure Integration Runtime in Azure Data Factory and Azure Synapse Analytics.
5
5
author: kromerm
6
-
ms.topic: conceptual
6
+
ms.topic: concept-article
7
7
ms.author: makromer
8
8
ms.subservice: data-flows
9
9
ms.custom: synapse
10
-
ms.date: 01/05/2024
10
+
ms.date: 03/31/2025
11
11
---
12
12
13
13
# Optimizing performance of the Azure Integration Runtime
@@ -37,7 +37,7 @@ The default cluster size is four driver cores and four worker cores (small). As
37
37
Data flows are priced at vcore-hrs meaning that both cluster size and execution-time factor into this. As you scale up, your cluster cost per minute will increase, but your overall time will decrease.
38
38
39
39
> [!TIP]
40
-
> There is a ceiling on how much the size of a cluster affects the performance of a data flow. Depending on the size of your data, there is a point where increasing the size of a cluster will stop improving performance. For example, If you have more cores than partitions of data, adding additional cores won't help.
40
+
> There's a ceiling on how much the size of a cluster affects the performance of a data flow. Depending on the size of your data, there's a point where increasing the size of a cluster will stop improving performance. For example, If you have more cores than partitions of data, adding more cores won't help.
41
41
A best practice is to start small and scale up to meet your performance needs.
42
42
43
43
## Custom shuffle partition
@@ -46,7 +46,6 @@ Dataflow divides the data into partitions and transforms it using different proc
46
46
47
47
While increasing the shuffle partitions, make sure data is spread across well. A rough number is to have approximately 1.5 GB of data per partition. If data is skewed, increasing the "Shuffle partitions" won't be helpful. For example, if you have 500 GB of data, having a value between 400 to 500 should work. Default limit for shuffle partitions is 200 that works well for approximately 300 GB of data.
48
48
49
-
50
49
1. From ADF portal under **Manage**, select a custom integration run time and you go to edit mode.
51
50
2. Under dataflow run time tab, go to **Compute Custom Properties** section.
52
51
3. Select **Shuffle partitions** under Property name, input value of your choice, like 250, 500 etc.
@@ -55,12 +54,12 @@ You can do same by editing JSON file of runtime by adding an array with property
55
54
56
55
## Time to live
57
56
58
-
By default, every data flow activity spins up a new Spark cluster based upon the Azure IR configuration. Cold cluster start-up time takes a few minutes and data processing can't start until it is complete. If your pipelines contain multiple **sequential** data flows, you can enable a time to live (TTL) value. Specifying a time to live value keeps a cluster alive for a certain period of time after its execution completes. If a new job starts using the IR during the TTL time, it will reuse the existing cluster and start up time will be greatly reduced. After the second job completes, the cluster will again stay alive for the TTL time.
57
+
By default, every data flow activity spins up a new Spark cluster based upon the Azure IR configuration. Cold cluster start-up time takes a few minutes and data processing can't start until it's complete. If your pipelines contain multiple **sequential** data flows, you can enable a time to live (TTL) value. Specifying a time to live value keeps a cluster alive for a certain period of time after its execution completes. If a new job starts using the IR during the TTL time, it will reuse the existing cluster and start up time will be greatly reduced. After the second job completes, the cluster will again stay alive for the TTL time.
59
58
60
-
However, if most of your data flows execute in parallel, it is not recommended that you enable TTL for the IR that you use for those activities. Only one job can run on a single cluster at a time. If there is an available cluster, but two data flows start, only one will use the live cluster. The second job will spin up its own isolated cluster.
59
+
However, if most of your data flows execute in parallel, it isn't recommended that you enable TTL for the IR that you use for those activities. Only one job can run on a single cluster at a time. If there's an available cluster, but two data flows start, only one will use the live cluster. The second job will spin up its own isolated cluster.
61
60
62
61
> [!NOTE]
63
-
> Time to live is not available when using the auto-resolve integration runtime (default).
62
+
> Time to live isn't available when using the auto-resolve integration runtime (default).
Copy file name to clipboardExpand all lines: articles/data-factory/concepts-nested-activities.md
+8-3Lines changed: 8 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,8 +6,8 @@ author: kromerm
6
6
ms.author: makromer
7
7
ms.subservice: orchestration
8
8
ms.custom: synapse
9
-
ms.topic: conceptual
10
-
ms.date: 10/20/2023
9
+
ms.topic: concept-article
10
+
ms.date: 03/31/2025
11
11
---
12
12
13
13
# Nested activities in Azure Data Factory and Azure Synapse Analytics
@@ -17,23 +17,26 @@ ms.date: 10/20/2023
17
17
This article helps you understand nested activities in Azure Data Factory and Azure Synapse Analytics and how to use them, limitations, and best practices.
18
18
19
19
## Overview
20
+
20
21
A Data Factory or Synapse Workspace pipeline can contain control flow activities that allow for other activities to be contained inside of them. Think of these nested activities as containers that hold one or more other activities that can execute depending on the top level control flow activity.
21
22
22
23
See the following example with an If activity that has one activity contained.
23
24
24
25
:::image type="content" source="media/concepts-pipelines-activities/nested-activity-example.png" alt-text="Screenshot showing an example If Condition activity with a contained activity inside.":::
25
26
26
27
## Control flow activities
28
+
27
29
The following control flow activities support nested activities:
28
30
29
31
Control activity | Description
30
32
---------------- | -----------
31
33
[For Each](control-flow-for-each-activity.md) | ForEach Activity defines a repeating control flow in your pipeline. This activity is used to iterate over a collection and executes specified activities in a loop. The loop implementation of this activity is similar to the Foreach looping structure in programming languages.
32
34
[If Condition Activity](control-flow-if-condition-activity.md) | The If Condition can be used to branch based on condition that evaluates to true or false. The If Condition activity provides the same functionality that an if statement provides in programming languages. It evaluates a set of activities when the condition evaluates to `true` and another set of activities when the condition evaluates to `false.`
33
-
[Until Activity](control-flow-until-activity.md) | Implements Do-Until loop that is similar to Do-Until looping structure in programming languages. It executes a set of activities in a loop until the condition associated with the activity evaluates to true. You can specify a timeout value for the until activity.
35
+
[Until Activity](control-flow-until-activity.md) | Implements Do-Until loop that is similar to Do-Until looping structure in programming languages. It executes a set of activities in a loop until the condition associated with the activity evaluates to true. You can specify a time-out value for the until activity.
34
36
[Switch Activity](control-flow-switch-activity.md) | The Switch activity provides the same functionality that a switch statement provides in programming languages. It evaluates a set of activities corresponding to a case that matches the condition evaluation.
35
37
36
38
## Navigating nested activities
39
+
37
40
There are two primary ways to navigate to the contained activities in a nested activity.
38
41
39
42
1. Each control flow activity that supports nested activities has an activity tab. Selecting the activity tab will then give you a pencil icon you can select to drill down into the inner activities panel.
@@ -46,6 +49,7 @@ Your pipeline canvas will then switch to the context of the inner activity conta
46
49
:::image type="content" source="media/concepts-pipelines-activities/nested-activity-breadcrumb.png" alt-text="Screenshot showing an example If Condition activity inside the true branch with a highlight on the breadcrumb to navigate back to the parent pipeline.":::
47
50
48
51
## Nested activity embedding limitations
52
+
49
53
There are constraints on the activities that support nesting (ForEach, Until, Switch, and If Condition), for nesting another nested activity. Specifically:
50
54
51
55
- If and Switch can be used inside ForEach or Until activities.
@@ -60,6 +64,7 @@ ForEach or Until supports only single level nesting
60
64
If and Switch can't be used inside If and Switch activities.
61
65
62
66
## Best practices for multiple levels of nested activities
67
+
63
68
In order to have logic that supports nesting more than one level deep, you can use the [Execute Pipeline Activity](control-flow-execute-pipeline-activity.md) inside of your nested activity to call another pipeline that then can have another level of nested activities. A common use case for this pattern is with the ForEach loop where you need to additionally loop based off logic in the inner activities.
64
69
65
70
An example of this pattern would be if you had a file system that had a list of folders and each folder there are multiple files you want to process. You would accomplish this pattern, generally, by performing the following.
0 commit comments