Skip to content

Commit ad98dc2

Browse files
Merge pull request #297356 from whhender/adf-march-freshness
Format, acrolinx, and freshness part 1
2 parents 72397ac + 999f244 commit ad98dc2

14 files changed

+96
-103
lines changed

articles/data-factory/concepts-data-flow-debug-mode.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -5,9 +5,9 @@ description: Start an interactive debug session when building data flows with Az
55
ms.author: makromer
66
author: kromerm
77
ms.subservice: data-flows
8-
ms.topic: conceptual
8+
ms.topic: concept-article
99
ms.custom: synapse
10-
ms.date: 10/20/2023
10+
ms.date: 03/31/2025
1111
---
1212

1313
# Mapping data flow Debug Mode
@@ -39,7 +39,7 @@ In most cases, it's a good practice to build your Data Flows in debug mode so th
3939
---
4040

4141
> [!NOTE]
42-
> Every debug session that a user starts from their browser UI is a new session with its own Spark cluster. You can use the monitoring view for debug sessions shown in the previous images to view and manage debug sessions. You are charged for every hour that each debug session is executing including the TTL time.
42+
> Every debug session that a user starts from their browser UI is a new session with its own Spark cluster. You can use the monitoring view for debug sessions shown in the previous images to view and manage debug sessions. You're charged for every hour that each debug session is executing including the TTL time.
4343
4444
This video clip talks about tips, tricks, and good practices for data flow debug mode.
4545
> [!VIDEO https://learn-video.azurefd.net/vod/player?id=8e101169-59fb-4371-aa88-039304f61b53]
@@ -73,9 +73,9 @@ With debug on, the Data Preview tab lights up on the bottom panel. Without debug
7373
You can sort columns in data preview and rearrange columns using drag and drop. Additionally, there's an export button on the top of the data preview panel that you can use to export the preview data to a CSV file for offline data exploration. You can use this feature to export up to 1,000 rows of preview data.
7474

7575
> [!NOTE]
76-
> File sources only limit the rows that you see, not the rows being read. For very large datasets, it is recommended that you take a small portion of that file and use it for your testing. You can select a temporary file in Debug Settings for each source that is a file dataset type.
76+
> File sources only limit the rows that you see, not the rows being read. For very large datasets, it's recommended that you take a small portion of that file and use it for your testing. You can select a temporary file in Debug Settings for each source that is a file dataset type.
7777
78-
When running in Debug Mode in Data Flow, your data won't be written to the Sink transform. A Debug session is intended to serve as a test harness for your transformations. Sinks aren't required during debug and are ignored in your data flow. If you wish to test writing the data in your Sink, execute the Data Flow from a pipeline and use the Debug execution from a pipeline.
78+
When running Data Flow in Debug Mode, your data won't be written to the Sink transform. A Debug session is intended to serve as a test harness for your transformations. Sinks aren't required during debug and are ignored in your data flow. If you wish to test writing the data in your Sink, execute the Data Flow from a pipeline and use the Debug execution from a pipeline.
7979

8080
Data Preview is a snapshot of your transformed data using row limits and data sampling from data frames in Spark memory. Therefore, the sink drivers aren't utilized or tested in this scenario.
8181

articles/data-factory/concepts-integration-runtime-performance.md

Lines changed: 6 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -3,11 +3,11 @@ title: Integration Runtime Performance
33
titleSuffix: Azure Data Factory & Azure Synapse
44
description: Learn about how to optimize and improve performance of the Azure Integration Runtime in Azure Data Factory and Azure Synapse Analytics.
55
author: kromerm
6-
ms.topic: conceptual
6+
ms.topic: concept-article
77
ms.author: makromer
88
ms.subservice: data-flows
99
ms.custom: synapse
10-
ms.date: 01/05/2024
10+
ms.date: 03/31/2025
1111
---
1212

1313
# Optimizing performance of the Azure Integration Runtime
@@ -37,7 +37,7 @@ The default cluster size is four driver cores and four worker cores (small). As
3737
Data flows are priced at vcore-hrs meaning that both cluster size and execution-time factor into this. As you scale up, your cluster cost per minute will increase, but your overall time will decrease.
3838

3939
> [!TIP]
40-
> There is a ceiling on how much the size of a cluster affects the performance of a data flow. Depending on the size of your data, there is a point where increasing the size of a cluster will stop improving performance. For example, If you have more cores than partitions of data, adding additional cores won't help.
40+
> There's a ceiling on how much the size of a cluster affects the performance of a data flow. Depending on the size of your data, there's a point where increasing the size of a cluster will stop improving performance. For example, If you have more cores than partitions of data, adding more cores won't help.
4141
A best practice is to start small and scale up to meet your performance needs.
4242

4343
## Custom shuffle partition
@@ -46,7 +46,6 @@ Dataflow divides the data into partitions and transforms it using different proc
4646

4747
While increasing the shuffle partitions, make sure data is spread across well. A rough number is to have approximately 1.5 GB of data per partition. If data is skewed, increasing the "Shuffle partitions" won't be helpful. For example, if you have 500 GB of data, having a value between 400 to 500 should work. Default limit for shuffle partitions is 200 that works well for approximately 300 GB of data.
4848

49-
5049
1. From ADF portal under **Manage**, select a custom integration run time and you go to edit mode.
5150
2. Under dataflow run time tab, go to **Compute Custom Properties** section.
5251
3. Select **Shuffle partitions** under Property name, input value of your choice, like 250, 500 etc.
@@ -55,12 +54,12 @@ You can do same by editing JSON file of runtime by adding an array with property
5554

5655
## Time to live
5756

58-
By default, every data flow activity spins up a new Spark cluster based upon the Azure IR configuration. Cold cluster start-up time takes a few minutes and data processing can't start until it is complete. If your pipelines contain multiple **sequential** data flows, you can enable a time to live (TTL) value. Specifying a time to live value keeps a cluster alive for a certain period of time after its execution completes. If a new job starts using the IR during the TTL time, it will reuse the existing cluster and start up time will be greatly reduced. After the second job completes, the cluster will again stay alive for the TTL time.
57+
By default, every data flow activity spins up a new Spark cluster based upon the Azure IR configuration. Cold cluster start-up time takes a few minutes and data processing can't start until it's complete. If your pipelines contain multiple **sequential** data flows, you can enable a time to live (TTL) value. Specifying a time to live value keeps a cluster alive for a certain period of time after its execution completes. If a new job starts using the IR during the TTL time, it will reuse the existing cluster and start up time will be greatly reduced. After the second job completes, the cluster will again stay alive for the TTL time.
5958

60-
However, if most of your data flows execute in parallel, it is not recommended that you enable TTL for the IR that you use for those activities. Only one job can run on a single cluster at a time. If there is an available cluster, but two data flows start, only one will use the live cluster. The second job will spin up its own isolated cluster.
59+
However, if most of your data flows execute in parallel, it isn't recommended that you enable TTL for the IR that you use for those activities. Only one job can run on a single cluster at a time. If there's an available cluster, but two data flows start, only one will use the live cluster. The second job will spin up its own isolated cluster.
6160

6261
> [!NOTE]
63-
> Time to live is not available when using the auto-resolve integration runtime (default).
62+
> Time to live isn't available when using the auto-resolve integration runtime (default).
6463
6564
## Related content
6665

articles/data-factory/concepts-nested-activities.md

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -6,8 +6,8 @@ author: kromerm
66
ms.author: makromer
77
ms.subservice: orchestration
88
ms.custom: synapse
9-
ms.topic: conceptual
10-
ms.date: 10/20/2023
9+
ms.topic: concept-article
10+
ms.date: 03/31/2025
1111
---
1212

1313
# Nested activities in Azure Data Factory and Azure Synapse Analytics
@@ -17,23 +17,26 @@ ms.date: 10/20/2023
1717
This article helps you understand nested activities in Azure Data Factory and Azure Synapse Analytics and how to use them, limitations, and best practices.
1818

1919
## Overview
20+
2021
A Data Factory or Synapse Workspace pipeline can contain control flow activities that allow for other activities to be contained inside of them. Think of these nested activities as containers that hold one or more other activities that can execute depending on the top level control flow activity.
2122

2223
See the following example with an If activity that has one activity contained.
2324

2425
:::image type="content" source="media/concepts-pipelines-activities/nested-activity-example.png" alt-text="Screenshot showing an example If Condition activity with a contained activity inside.":::
2526

2627
## Control flow activities
28+
2729
The following control flow activities support nested activities:
2830

2931
Control activity | Description
3032
---------------- | -----------
3133
[For Each](control-flow-for-each-activity.md) | ForEach Activity defines a repeating control flow in your pipeline. This activity is used to iterate over a collection and executes specified activities in a loop. The loop implementation of this activity is similar to the Foreach looping structure in programming languages.
3234
[If Condition Activity](control-flow-if-condition-activity.md) | The If Condition can be used to branch based on condition that evaluates to true or false. The If Condition activity provides the same functionality that an if statement provides in programming languages. It evaluates a set of activities when the condition evaluates to `true` and another set of activities when the condition evaluates to `false.`
33-
[Until Activity](control-flow-until-activity.md) | Implements Do-Until loop that is similar to Do-Until looping structure in programming languages. It executes a set of activities in a loop until the condition associated with the activity evaluates to true. You can specify a timeout value for the until activity.
35+
[Until Activity](control-flow-until-activity.md) | Implements Do-Until loop that is similar to Do-Until looping structure in programming languages. It executes a set of activities in a loop until the condition associated with the activity evaluates to true. You can specify a time-out value for the until activity.
3436
[Switch Activity](control-flow-switch-activity.md) | The Switch activity provides the same functionality that a switch statement provides in programming languages. It evaluates a set of activities corresponding to a case that matches the condition evaluation.
3537

3638
## Navigating nested activities
39+
3740
There are two primary ways to navigate to the contained activities in a nested activity.
3841

3942
1. Each control flow activity that supports nested activities has an activity tab. Selecting the activity tab will then give you a pencil icon you can select to drill down into the inner activities panel.
@@ -46,6 +49,7 @@ Your pipeline canvas will then switch to the context of the inner activity conta
4649
:::image type="content" source="media/concepts-pipelines-activities/nested-activity-breadcrumb.png" alt-text="Screenshot showing an example If Condition activity inside the true branch with a highlight on the breadcrumb to navigate back to the parent pipeline.":::
4750

4851
## Nested activity embedding limitations
52+
4953
There are constraints on the activities that support nesting (ForEach, Until, Switch, and If Condition), for nesting another nested activity. Specifically:
5054

5155
- If and Switch can be used inside ForEach or Until activities.
@@ -60,6 +64,7 @@ ForEach or Until supports only single level nesting
6064
If and Switch can't be used inside If and Switch activities.
6165

6266
## Best practices for multiple levels of nested activities
67+
6368
In order to have logic that supports nesting more than one level deep, you can use the [Execute Pipeline Activity](control-flow-execute-pipeline-activity.md) inside of your nested activity to call another pipeline that then can have another level of nested activities. A common use case for this pattern is with the ForEach loop where you need to additionally loop based off logic in the inner activities.
6469

6570
An example of this pattern would be if you had a file system that had a list of folders and each folder there are multiple files you want to process. You would accomplish this pattern, generally, by performing the following.

0 commit comments

Comments
 (0)