You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/data-factory/author-global-parameters.md
+1-57Lines changed: 1 addition & 57 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,7 +6,7 @@ ms.subservice: authoring
6
6
ms.topic: conceptual
7
7
author: nabhishek
8
8
ms.author: abnarain
9
-
ms.date: 09/26/2022
9
+
ms.date: 05/05/2023
10
10
ms.custom: devx-track-azurepowershell
11
11
---
12
12
@@ -65,62 +65,6 @@ We strongly recommend using the new mechanism of including global parameters in
65
65
66
66
67
67
68
-
### Deploying using PowerShell (older mechanism)
69
-
70
-
> [!NOTE]
71
-
> This is not required if you're including global parameters using the 'Manage hub' -> 'ARM template' -> 'Include global parameters in an ARM template' since you can deploy the ARM with the ARM templates without breaking the Factory-level configurations. For backward compatability we will continue to support it.
72
-
73
-
The following steps outline how to deploy global parameters via PowerShell. This is useful when your target factory has a factory-level setting such as customer-managed key.
74
-
75
-
When you publish a factory or export an ARM template with global parameters, a folder called *globalParameters* is created with a file called *your-factory-name_GlobalParameters.json*. This file is a JSON object that contains each global parameter type and value in the published factory.
76
-
77
-
:::image type="content" source="media/author-global-parameters/global-parameters-adf-publish.png" alt-text="Publishing global parameters":::
78
-
79
-
If you're deploying to a new environment such as TEST or PROD, it's recommended to create a copy of this global parameters file and overwrite the appropriate environment-specific values. When you republish the original global parameters file will get overwritten, but the copy for the other environment will be untouched.
80
-
81
-
For example, if you have a factory named 'ADF-DEV' and a global parameter of type string named 'environment' with a value 'dev', when you publish a file named *ADF-DEV_GlobalParameters.json* will get generated. If deploying to a test factory named 'ADF_TEST', create a copy of the JSON file (for example named ADF-TEST_GlobalParameters.json) and replace the parameter values with the environment-specific values. The parameter 'environment' may have a value 'test' now.
82
-
83
-
:::image type="content" source="media/author-global-parameters/powershell-task.png" alt-text="Deploying global parameters":::
84
-
85
-
Use the below PowerShell script to promote global parameters to additional environments. Add an Azure PowerShell DevOps task before your ARM Template deployment. In the DevOps task, you must specify the location of the new parameters file, the target resource group, and the target data factory.
86
-
87
-
> [!NOTE]
88
-
> To deploy global parameters using PowerShell, you must use at least version 4.4.0 of the Az module.
Copy file name to clipboardExpand all lines: articles/data-factory/concepts-integration-runtime-performance.md
+3-4Lines changed: 3 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,7 +8,7 @@ ms.author: makromer
8
8
ms.service: data-factory
9
9
ms.subservice: data-flows
10
10
ms.custom: synapse
11
-
ms.date: 03/10/2023
11
+
ms.date: 04/21/2023
12
12
---
13
13
14
14
# Optimizing performance of the Azure Integration Runtime
@@ -55,17 +55,16 @@ Dataflow divides the data into partitions and transforms it using different proc
55
55
56
56
While increasing the shuffle partitions, make sure data is spread across well. A rough number is to have approximately 1.5 GB of data per partition. If data is skewed, increasing the "Shuffle partitions" won't be helpful. For example, if you have 500 GB of data, having a value between 400 to 500 should work. Default limit for shuffle partitions is 200 that works well for approximately 300 GB of data.
57
57
58
-
Here are the steps on how it's set in a custom integration runtime. You can't set it for autoresolve integration runtime.
59
58
60
59
1. From ADF portal under **Manage**, select a custom integration run time and you go to edit mode.
61
60
2. Under dataflow run time tab, go to **Compute Custom Properties** section.
62
-
3. Select **Shuffle Partitions** under Property name, input value of your choice, like 250, 500 etc.
61
+
3. Select **Shuffle partitions** under Property name, input value of your choice, like 250, 500 etc.
63
62
64
63
You can do same by editing JSON file of runtime by adding an array with property name and value after an existing property like *cleanup* property.
65
64
66
65
## Time to live
67
66
68
-
By default, every data flow activity spins up a new Spark cluster based upon the Azure IR configuration. Cold cluster start-up time takes a few minutes and data processing can't start until it is complete. If your pipelines contain multiple **sequential** data flows, you can enable a time to live (TTL) value. Specifying a time to live value keeps a cluster alive for a certain period of time after its execution completes. If a new job starts using the IR during the TTL time, it will reuse the existing cluster and start up time will greatly reduced. After the second job completes, the cluster will again stay alive for the TTL time.
67
+
By default, every data flow activity spins up a new Spark cluster based upon the Azure IR configuration. Cold cluster start-up time takes a few minutes and data processing can't start until it is complete. If your pipelines contain multiple **sequential** data flows, you can enable a time to live (TTL) value. Specifying a time to live value keeps a cluster alive for a certain period of time after its execution completes. If a new job starts using the IR during the TTL time, it will reuse the existing cluster and start up time will be greatly reduced. After the second job completes, the cluster will again stay alive for the TTL time.
69
68
70
69
However, if most of your data flows execute in parallel, it is not recommended that you enable TTL for the IR that you use for those activities. Only one job can run on a single cluster at a time. If there is an available cluster, but two data flows start, only one will use the live cluster. The second job will spin up its own isolated cluster.
0 commit comments