You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/reliability/reliability-data-factory.md
+25-25Lines changed: 25 additions & 25 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
---
2
2
title: Reliability in Azure Data Factory
3
-
description: Learn about reliability in Azure Data Factory by using availability zones, multiple-region deployments, and resilient pipeline practices.
3
+
description: Learn about reliability in Azure Data Factory, including availability zones, multiple-region deployments, and resilient pipeline practices.
4
4
author: jonburchel
5
5
ms.author: jburchel
6
6
ms.topic: reliability-article
@@ -22,15 +22,15 @@ You can use Azure Data Factory to create flexible and powerful data pipelines fo
22
22
23
23
-*Integration runtimes (IRs)*, which connect to data stores and perform activities defined in your pipeline.
24
24
25
-
-*Data stores that are connected to the data factory.* To help ensure that data stores meet your business continuity requirements, consult their product reliability documentation and guidance.
25
+
-*Data stores that connect to the data factory.* To help ensure that data stores meet your business continuity requirements, consult their product reliability documentation and guidance.
26
26
27
27
## Reliability architecture overview
28
28
29
29
Azure Data Factory consists of multiple infrastructure components. Each component supports infrastructure reliability in various ways.
30
30
31
31
The components of Azure Data Factory include:
32
32
33
-
-**Core Azure Data Factory service**, which manages pipeline triggers and oversees the coordination of pipeline activities. The core service also manages metadata for each component in the data factory. Microsoft manages the core service.
33
+
-**The core Azure Data Factory service**, which manages pipeline triggers and oversees the coordination of pipeline activities. The core service also manages metadata for each component in the data factory. Microsoft manages the core service.
34
34
35
35
-**[IRs](../data-factory/concepts-integration-runtime.md#integration-runtime-types)**, which perform specific activities within a pipeline. There are different types of IRs.
36
36
@@ -50,26 +50,26 @@ Your pipeline activities should be *idempotent*, which means that they can be re
50
50
51
51
To prevent duplicate record insertion because of a transient fault, implement the following best practices:
52
52
53
-
-*Use unique identifiers*to each record before you write to the database. This approach can help you find and eliminate duplicate records.
53
+
-*Use unique identifiers*for each record before you write to the database. This approach can help you find and eliminate duplicate records.
54
54
55
55
-*Use an upsert strategy* for connectors that support upsert. Before duplicate record insertion occurs, use this approach to check whether a record already exists. If it does exist, update it. If it doesn't exist, insert it. For example, SQL commands like `MERGE` or `ON DUPLICATE KEY UPDATE` use this upsert approach.
56
56
57
-
-*Use copy action strategies* that are described in[Data consistency verification in copy activity](../data-factory/copy-activity-data-consistency.md).
57
+
-*Use copy action strategies.* For more information, see[Data consistency verification in copy activity](../data-factory/copy-activity-data-consistency.md).
58
58
59
59
### Retry policies
60
60
61
61
You can use retry policies to configure parts of your pipeline to retry if there's a problem, like transient faults in connected resources. In Azure Data Factory, you can configure retry policies on the following pipeline object types:
For more information about how to change or disable retry policies for your data factory triggers and activities, see [Pipeline execution and triggers](../data-factory/concepts-pipeline-execution-triggers.md).
66
+
For more information about how to change or disable retry policies for your data factory triggers and activities, see [Pipeline runs and triggers](../data-factory/concepts-pipeline-execution-triggers.md).
67
67
68
68
## Availability zone support
69
69
70
70
[!INCLUDE [AZ support description](includes/reliability-availability-zone-description-include.md)]
71
71
72
-
Azure Data Factory supports *zone redundancy*, which provides resiliency to failures in [availability zones](availability-zones-overview.md). This section describes how each part of the Azure Data Factory service supports zone redundancy.
72
+
Azure Data Factory supports zone redundancy, which provides resiliency to failures in [availability zones](availability-zones-overview.md). This section describes how each part of the Azure Data Factory service supports zone redundancy.
73
73
74
74
### Regions supported
75
75
@@ -101,7 +101,7 @@ Zone-redundant Azure Data Factory resources can be deployed in [any region that
101
101
102
102
### Configure availability zone support
103
103
104
-
**Azure Data Factory core service:** No configuration required. Azure Data Factory core service automatically supports zone redundancy.
104
+
**Core service:** No configuration required. The Data Factory core service automatically supports zone redundancy.
105
105
106
106
**IRs:**
107
107
@@ -119,7 +119,7 @@ Zone-redundant Azure Data Factory resources can be deployed in [any region that
119
119
120
120
-*Azure IR* scales automatically based on demand, and you don't need to plan or manage capacity.
121
121
122
-
-*Azure-SSIS IR* requires you to specifically configure the number of nodes that you use. To prepare for availability zone failure, consider *over-provisioning* the capacity of your IR. Over-provisioning allows the solution to tolerate some degree of capacity loss and still continue to function without degraded performance. For more information, see [Manage capacity with over-provisioning](./concept-redundancy-replication-backup.md#manage-capacity-with-over-provisioning).
122
+
-*Azure-SSIS IR* requires you to specifically configure the number of nodes that you use. To prepare for availability zone failure, consider over-provisioning the capacity of your IR. Over-provisioning allows the solution to tolerate some degree of capacity loss and still continue to function without degraded performance. For more information, see [Manage capacity by over-provisioning](./concept-redundancy-replication-backup.md#manage-capacity-with-over-provisioning).
123
123
124
124
-*SHIR* requires you to configure your own capacity and scaling. Consider over-provisioning when you deploy a SHIR.
125
125
@@ -131,15 +131,15 @@ During normal operations, Azure Data Factory automatically distributes pipeline
131
131
132
132
**Detection and response:** The Azure Data Factory platform is responsible for detecting a failure in an availability zone and responding. You don't need to do anything to initiate a zone failover in your pipelines or other components.
133
133
134
-
**Active requests:** Any pipelines and triggers in progress continue to run, and you won't notice a zone failure. However, activities in progress during a zone failure might fail and be restarted. It's important to design activities to be idempotent, which helps them to recover from zone failures and other faults. For more information, see [Transient faults](#transient-faults).
134
+
**Active requests:** Any pipelines and triggers in progress continue to run, and you don't experience any immediate disruption from a zone failure. However, activities in progress during a zone failure might fail and be restarted. It's important to design activities to be idempotent, which helps them recover from zone failures and other faults. For more information, see [Transient faults](#transient-faults).
135
135
136
136
### Failback
137
137
138
138
When the availability zone recovers, Azure Data Factory automatically fails back to the original zone. You don't need to do anything to initiate a zone failback in your pipelines or other components.
139
139
140
140
However, if you use the SHIR, you might need to restart your compute resources if they've been stopped.
141
141
142
-
### Testing for zone failures
142
+
### Test for zone failures
143
143
144
144
For the core service, and for Azure and Azure-SSIS IRs, Azure Data Factory manages traffic routing, failover, and failback for zone-redundant resources. Because this feature is fully managed, you don't need to initiate or validate availability zone failure processes.
145
145
@@ -151,11 +151,11 @@ Azure Data Factory resources are deployed into a single Azure region. If the reg
151
151
152
152
### Microsoft-managed failover to a paired region
153
153
154
-
Azure Data Factory supports Microsoft-managed failover for data factories in *paired regions*, except for Brazil South and Southeast Asia. In the unlikely event of a prolonged region failure, Microsoft might initiate a regional failover of your Azure Data Factory instance.
154
+
Azure Data Factory supports Microsoft-managed failover for data factories in paired regions, except for Brazil South and Southeast Asia. In the unlikely event of a prolonged region failure, Microsoft might initiate a regional failover of your Azure Data Factory instance.
155
155
156
-
Because of data residency requirements in Brazil South and Southeast Asia, Azure Data Factory data is stored in the local region only by using [Azure Storage zone-redundant storage](../storage/common/storage-redundancy.md#zone-redundant-storage). For Southeast Asia, all data is stored in Singapore. For Brazil South, all data is stored in Brazil.
156
+
Because of data residency requirements in Brazil South and Southeast Asia, Azure Data Factory data is stored only in the local region by using [Azure Storage zone-redundant storage](../storage/common/storage-redundancy.md#zone-redundant-storage). For Southeast Asia, all data is stored in Singapore. For Brazil South, all data is stored in Brazil.
157
157
158
-
For data factories in *nonpaired regions*, or in Brazil South or Southeast Asia, Microsoft doesn't perform regional failover on your behalf.
158
+
For data factories in nonpaired regions, or in Brazil South or Southeast Asia, Microsoft doesn't perform regional failover on your behalf.
159
159
160
160
> [!IMPORTANT]
161
161
> Microsoft triggers Microsoft-managed failover. It's likely to occur after a significant delay and is done on a best-effort basis. There are also some exceptions to this process. You might experience some loss of your data factory metadata. The failover of Azure Data Factory resources might occur at a different time than the failover of other Azure services.
@@ -166,37 +166,37 @@ For data factories in *nonpaired regions*, or in Brazil South or Southeast Asia,
166
166
167
167
To prepare for a failover, there might be some extra considerations, depending on the IR that you use.
168
168
169
-
- You can configure *Azure IR* to automatically resolve the region that it uses. If the region is set to *auto resolve* and there's an outage in the primary region, the Azure IR automatically fails over to the paired region. This failover is subject to the limitations described in [Microsoft-managed failover to a paired region](#microsoft-managed-failover-to-a-paired-region). To configure the Azure IR region for your activity execution or dispatch in the IR setup, set the region to *auto resolve*.
169
+
- You can configure *Azure IR* to automatically resolve the region that it uses. If the region is set to *auto resolve* and there's an outage in the primary region, the Azure IR automatically fails over to the paired region. This failover is subject to [limitations](#microsoft-managed-failover-to-a-paired-region). To configure the Azure IR region for your activity implementation or dispatch in the IR setup, set the region to *auto resolve*.
170
170
171
-
-*Azure-SSIS IR* failover is managed separately from Microsoft-managed failover of the data factory. For more information, see [Alternative multiple-region approaches](#alternative-multiple-region-approaches).
171
+
-*Azure-SSIS IR* failover is managed separately from a Microsoft-managed failover of the data factory. For more information, see [Alternative multiple-region approaches](#alternative-multiple-region-approaches).
172
172
173
-
-*SHIR* runs on infrastructure that you're responsible for, and so Microsoft-managed failover doesn't apply to SHIRs. For more information, see [Alternative multiple-region approaches](#alternative-multiple-region-approaches).
173
+
-*SHIR* runs on infrastructure that you're responsible for, so a Microsoft-managed failover doesn't apply to SHIRs. For more information, see [Alternative multiple-region approaches](#alternative-multiple-region-approaches).
174
174
175
175
#### Post-failover reconfiguration
176
176
177
-
After a Microsoft-managed failover is complete, you can then access your Azure Data Factory pipeline in the paired region. However, after the failover completes, you might need to perform some reconfiguration for IRs or other components. This process includes re-establishing the networking configuration.
177
+
After a Microsoft-managed failover is complete, you can access your Azure Data Factory pipeline in the paired region. However, after the failover completes, you might need to perform some reconfiguration for IRs or other components. This process includes re-establishing the networking configuration.
178
178
179
179
### Alternative multiple-region approaches
180
180
181
181
If you need your pipelines to be resilient to regional outages and you need control over the failover process, consider using a metadata-driven pipeline.
182
182
183
-
-**Set up source control for your Azure Data Factory** to track and audit any changes made to your metadata. You can use this approach to access your metadata JSON files for pipelines, datasets, linked services, and triggers. Azure Data Factory supports different Git repository types, like Azure DevOps and GitHub. For more information, see [Source control in Azure Data Factory](../data-factory/source-control.md).
183
+
-**Set up source control for Azure Data Factory** to track and audit any changes to your metadata. You can use this approach to access your metadata JSON files for pipelines, datasets, linked services, and triggers. Azure Data Factory supports different Git repository types, like Azure DevOps and GitHub. For more information, see [Source control in Azure Data Factory](../data-factory/source-control.md).
184
184
185
-
-**Use a continuous integration and delivery (CI/CD) system**, such as Azure DevOps, to manage your pipeline metadata and deployments. You can use CI/CD to quickly restore operations to an instance in another region. If a region is unavailable, you can provision a new data factory manually or through automation. After the new data factory is created, you can restore your pipelines, datasets, and linked services JSON from the existing Git repository. For more information, see [Business continuity and disaster recovery (BCDR) for Azure Data Factory and Azure Synapse Analytics pipelines](/azure/architecture/example-scenario/analytics/pipelines-disaster-recovery).
185
+
-**Use a continuous integration and continuous delivery (CI/CD) system**, such as Azure DevOps, to manage your pipeline metadata and deployments. You can use CI/CD to quickly restore operations to an instance in another region. If a region is unavailable, you can provision a new data factory manually or through automation. After the new data factory is created, you can restore your pipelines, datasets, and linked services JSON from the existing Git repository. For more information, see [Business continuity and disaster recovery (BCDR) for Azure Data Factory and Azure Synapse Analytics pipelines](/azure/architecture/example-scenario/analytics/pipelines-disaster-recovery).
186
186
187
187
Depending on the IR that you use, there might be other considerations.
188
188
189
-
-*Azure-SSIS IR* uses a database stored in Azure SQL Database or Azure SQL Managed Instance. You can configure geo-replication or a failover group for this database. The Azure-SSIS database is then located in a primary Azure region with read-write access (the *primary role*) and is continuously replicated to a secondary region with read-only access (the *secondary role*). If the primary region is lost, a failover is triggered, which causes the primary and secondary databases to swap roles.
189
+
-*Azure-SSIS IR* uses a database stored in Azure SQL Database or Azure SQL Managed Instance. You can configure geo-replication or a failover group for this database. The Azure-SSIS database is located in a primary Azure region that has read-write access. The database is continuously replicated to a secondary region that has read-only access. If the primary region is unavailable, a failover triggers, which causes the primary and secondary databases to swap roles.
190
190
191
-
You can also configure a dual standby Azure SSIS IR pair that works in sync with Azure SQL Database or Azure SQL Managed Instance failover group.
191
+
You can also configure a dual standby Azure SSIS IR pair that works in sync with a SQL Database or SQL Managed Instance failover group.
192
192
193
193
For more information, see [Configure Azure-SSIS IR for BCDR](../data-factory/configure-bcdr-azure-ssis-integration-runtime.md).
194
194
195
195
-*SHIR* runs on infrastructure that you manage. If the SHIR is deployed to an Azure VM, you can use [Azure Site Recovery](../site-recovery/site-recovery-overview.md) to trigger [VM failover](../site-recovery/azure-to-azure-architecture.md) to another region.
196
196
197
197
## Backup and restore
198
198
199
-
Azure Data Factory enables CI/CD by integrating with source control, which allows you to back up metadata from a data factory instance. This metadata can then be deployed seamlessly into a new environment. For more information, see [Continuous integration and delivery in Azure Data Factory](../data-factory/continuous-integration-delivery.md).
199
+
Data Factory supports CI/CD through source control integration, so that you can back up the metadata of a data factory instance. CI/CD pipelines deploy this metadata seamlessly into a new environment. For more information, see [CI/CD in Azure Data Factory](../data-factory/continuous-integration-delivery.md).
0 commit comments