Skip to content

Commit c16d62c

Browse files
authored
Merge pull request #300583 from msmbaldwin/akv-reliability
Reliability guide for Azure Key Vault
2 parents 71f4b11 + 6787ccb commit c16d62c

File tree

5 files changed

+222
-5
lines changed

5 files changed

+222
-5
lines changed

articles/reliability/TOC.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -284,7 +284,7 @@
284284
- name: Azure Firewall
285285
href: ../firewall/deploy-availability-zone-powershell.md?toc=/azure/reliability/toc.json&bc=/azure/reliability/breadcrumb/toc.json
286286
- name: Azure Key Vault
287-
href: /azure/key-vault/general/disaster-recovery-guidance?toc=/azure/reliability/toc.json&bc=/azure/reliability/breadcrumb/toc.json
287+
href: reliability-key-vault.md
288288
- name: Microsoft Defender for Cloud DevOps security
289289
href: reliability-defender-devops.md
290290
- name: Storage

articles/reliability/availability-zones-service-support.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -65,7 +65,7 @@ Some Azure services are *nonregional*, which means that you don't deploy the ser
6565
| [Azure HPC Cache](../hpc-cache/hpc-cache-overview.md) | | :::image type="content" source="media/icon-checkmark.svg" alt-text="Yes" border="false"::: |
6666
| [Azure IoT Hub](reliability-iot-hub.md) | :::image type="content" source="media/icon-checkmark.svg" alt-text="Yes" border="false"::: | |
6767
| [Azure IoT Hub Device Provisioning Service](../iot-dps/about-iot-dps.md) | :::image type="content" source="media/icon-checkmark.svg" alt-text="Yes" border="false"::: | |
68-
| [Azure Key Vault](/azure/key-vault/general/disaster-recovery-guidance) | :::image type="content" source="media/icon-checkmark.svg" alt-text="Yes" border="false"::: | |
68+
| [Azure Key Vault](./reliability-key-vault.md#availability-zone-support) | :::image type="content" source="media/icon-checkmark.svg" alt-text="Yes" border="false"::: | |
6969
| [Azure Kubernetes Service (AKS)](reliability-aks.md#availability-zone-support) | :::image type="content" source="media/icon-checkmark.svg" alt-text="Yes" border="false"::: | :::image type="content" source="media/icon-checkmark.svg" alt-text="Yes" border="false"::: |
7070
| [Azure Load Balancer](reliability-load-balancer.md#availability-zone-support) | :::image type="content" source="media/icon-checkmark.svg" alt-text="Yes" border="false"::: | :::image type="content" source="media/icon-checkmark.svg" alt-text="Yes" border="false"::: |
7171
| [Azure Logic Apps](./reliability-logic-apps.md#availability-zone-support) | :::image type="content" source="media/icon-checkmark.svg" alt-text="Yes" border="false"::: | |

articles/reliability/concept-business-continuity-high-availability-disaster-recovery.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -314,7 +314,7 @@ Many Azure data and storage services support backups, such as the following:
314314

315315
- [Azure Backup](/azure/reliability/reliability-backup) provides automated backups for virtual machine disks, storage accounts, AKS, and a variety of other sources.
316316
- Many Azure database services, including [Azure SQL Database](/azure/azure-sql/database/high-availability-sla-local-zone-redundancy) and [Azure Cosmos DB](/azure/reliability/reliability-cosmos-db-nosql), have an automated backup capability for your databases.
317-
- [Azure Key Vault](/azure/key-vault/general/disaster-recovery-guidance) provides features to back up your secrets, certificates, and keys.
317+
- [Azure Key Vault](./reliability-key-vault.md) provides features to back up your secrets, certificates, and keys.
318318

319319
#### Automated deployments
320320

articles/reliability/overview-reliability-guidance.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -83,8 +83,8 @@ This section provides links to reliability guidance for many Azure services. Eac
8383
|Azure Health Data Services: De-identification service (preview)|[Reliability in Azure Health Data Services: De-Identification service](reliability-health-data-services-deidentification.md)||
8484
|Azure Health Data Services: Workspace services (FHIR®, DICOM®, MedTech) | | [Business continuity and disaster recovery considerations](/azure/healthcare-apis/business-continuity-disaster-recovery?toc=/azure/reliability/toc.json&bc=/azure/reliability/breadcrumb/toc.json) |
8585
|Azure HDInsight| [Reliability in Azure HDInsight](reliability-hdinsight.md)||
86-
|Azure IoT Hub|| [Reliability in Azure IoT Hub](reliability-iot-hub.md). |
87-
|Azure Key Vault|| [Azure Key Vault availability and redundancy](/azure/key-vault/general/disaster-recovery-guidance?toc=/azure/reliability/toc.json&bc=/azure/reliability/breadcrumb/toc.json) |
86+
|Azure IoT Hub|| [Reliability in Azure IoT Hub](reliability-iot-hub.md) |
87+
|Azure Key Vault|| [Reliability in Azure Key Vault](./reliability-key-vault.md) |
8888
|Azure Kubernetes Service (AKS)| [Reliability in Azure Kubernetes Service (AKS)](reliability-aks.md)||
8989
|Azure Load Balancer| [Reliability in Azure Load Balancer](reliability-load-balancer.md )||
9090
|Azure Logic Apps|[Reliability in Azure Logic Apps](reliability-logic-apps.md) ||
Lines changed: 217 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,217 @@
1+
---
2+
title: Reliability in Azure Key Vault
3+
description: Find out about reliability in Azure Key Vault, including availability zones and multi-region deployments.
4+
author: msmbaldwin
5+
ms.author: mbaldwin
6+
ms.topic: reliability-article
7+
ms.custom: subject-reliability, references_regions
8+
ms.service: azure-key-vault
9+
ms.date: 06/20/2025
10+
#Customer intent: As an engineer responsible for business continuity, I want to understand the details of how Azure Key Vault works from a reliability perspective and plan disaster recovery strategies in alignment with the exact processes that Azure services follow during different kinds of situations.
11+
---
12+
13+
# Reliability in Azure Key Vault
14+
15+
This article describes reliability support in Azure Key Vault, covering intra-regional resiliency via [availability zones](#availability-zone-support) and [multi-region deployments](#multi-region-support).
16+
17+
[!INCLUDE [Shared responsibility description](includes/reliability-shared-responsibility-include.md)]
18+
19+
Azure Key Vault is a cloud service that provides a secure store for secrets, such as keys, passwords, certificates, and other sensitive information. Key Vault offers a range of built-in reliability features to help ensure that your secrets remain available, including automatic region replication, data redundancy, and the ability to back up and restore your secrets.
20+
21+
## Production deployment recommendations
22+
23+
For production deployments of Azure Key Vault, we recommend that you:
24+
25+
- Use Standard or Premium tier key vaults
26+
- Enable soft delete and purge protection to prevent accidental or malicious deletion
27+
- For critical workloads, consider implementing multi-region strategies as described in this guide
28+
29+
## Reliability architecture overview
30+
31+
To ensure high durability and availability of your keys, secrets, and certificates in the event of a hardware failure or network outage, Key Vault provides multiple layers of redundancy to maintain availability during:
32+
33+
- Hardware failures
34+
- Network outages
35+
- Localized disasters
36+
- Maintenance activities
37+
38+
By default, Azure Key Vault achieves redundancy by replicating your key vault and its contents within the region.
39+
40+
In addition, if the region has a [paired region](./regions-list.md) and that paired region is in the same geography as the primary region, the contents are also replicated to the paired region. This approach ensures high durability of your keys and secrets, protecting against hardware failures, network outages, or localized disasters.
41+
## Transient faults
42+
43+
[!INCLUDE [Transient fault description](includes/reliability-transient-fault-description-include.md)]
44+
45+
To handle any transient failures that might occur, your client applications should implement retry logic when interacting with Key Vault. Some best practices include:
46+
47+
- Use the [Azure SDKs](https://azure.microsoft.com/downloads/), which typically include built-in retry mechanisms.
48+
- If your clients connect directly to Key Vault, implement exponential backoff retry policies.
49+
- Cache secrets in memory when possible to reduce direct requests to Key Vault.
50+
- Monitor for throttling errors, as exceeding Key Vault service limits will cause throttling.
51+
52+
If you're using Key Vault in high-throughput scenarios, consider distributing your operations across multiple key vaults to avoid throttling limits. Azure Key Vault has specific guidance for these scenarios:
53+
54+
- A high-throughput scenario is one that approaches or exceeds the [service limits](/azure/key-vault/general/service-limits) for Key Vault operations (for example, 200 operations per second for software-protected keys).
55+
- For high-throughput workloads, divide your Key Vault traffic among multiple vaults and different regions.
56+
- A subscription-wide limit for all transaction types is five times the individual key vault limit.
57+
- Use a separate vault for each security/availability domain (for example, if you have five apps in two regions, consider using 10 vaults).
58+
- For public-key operations such as encryption, wrapping, and verification, perform these operations locally by caching the public key material.
59+
60+
For comprehensive throttling guidance, see [Azure Key Vault throttling guidance](/azure/key-vault/general/overview-throttling).
61+
62+
## Availability zone support
63+
64+
[!INCLUDE [AZ support description](includes/reliability-availability-zone-description-include.md)]
65+
66+
Azure Key Vault automatically provides zone redundancy in [regions that support availability zones](./regions-list.md), providing high availability within a region without requiring any specific configuration.
67+
68+
69+
When an availability zone becomes unavailable, Azure Key Vault automatically redirects your requests to other healthy availability zones to ensure high availability.
70+
71+
### Region support
72+
73+
Azure Key Vault enables zone redundancy by default in [all Azure regions that support availability zones](./regions-list.md).
74+
75+
### Requirements
76+
77+
All Key Vault SKUs (Standard and Premium) support the same level of availability and resiliency. There aren't any tier-specific requirements to achieve zone resilience.
78+
79+
### Cost
80+
81+
There are no additional costs associated with Key Vault's zone redundancy. The pricing is based on the SKU (Standard or Premium) and the number of operations performed.
82+
83+
### Normal operations
84+
85+
The following section describes what to expect when key vaults are in a region with availability zones, and all availability zones are operational:
86+
87+
- **Traffic routing between zones:** Azure Key Vault automatically manages traffic routing between availability zones. During normal operations, requests are distributed across zones transparently.
88+
89+
- **Data replication between zones:** Key Vault data is synchronously replicated across availability zones in regions that support zones. This ensures that your keys, secrets, and certificates remain consistent and available even if a zone becomes unavailable.
90+
91+
### Zone-down experience
92+
93+
The following section describes what to expect when key vaults are in a region with availability zones, and one or more availability zones are unavailable:
94+
95+
- **Detection and response:** The Key Vault service is responsible for detecting zone failures and automatically responding to them. You don't need to take any action during a zone failure.
96+
97+
- **Notification:** You can monitor the status of your key vault through Azure Resource Health and Azure Service Health. These services provide notifications about any service degradation.
98+
99+
- **Active requests:** During a zone failure, any in-flight requests to the affected zone might fail and need to be retried by client applications. Client applications should follow [transient fault handling practices](#transient-faults) to ensure they can retry requests in the event of a zone failure.
100+
101+
- **Expected data loss:** No data loss is expected during a zone failure due to the synchronous replication between zones.
102+
103+
- **Expected downtime:** For read operations, there should be minimal to no downtime during a zone failure. Write operations might experience temporary unavailability while the service adjusts to the zone failure. Read operations are expected to remain available during zone failures.
104+
105+
- **Traffic rerouting:** Key Vault automatically reroutes traffic away from the affected zone to healthy zones without requiring any customer intervention.
106+
107+
108+
For more information on the zone-down experience, see [Failover within a region](/azure/key-vault/general/disaster-recovery-guidance#failover-within-a-region) in the Key Vault availability and redundancy documentation.
109+
110+
### Failback
111+
112+
When the affected availability zone recovers, Azure Key Vault automatically restores operations to that zone. This process is fully managed by the Azure platform and doesn't require any customer intervention.
113+
114+
## Multi-region support
115+
116+
Azure Key Vault resources are deployed into a single Azure region. If the region becomes unavailable, your key vault is also unavailable. However, there are approaches that you can use to help ensure resilience to region outages. These approaches depend on whether the key vault is in a paired or nonpaired region and on your specific requirements and configuration.
117+
118+
### Microsoft-managed failover to a paired region
119+
120+
Azure Key Vault supports Microsoft-managed replication and failover for key vaults in most paired regions. The contents of your key vault are automatically replicated both within the region and, asynchronously, to the paired region. This approach ensures high durability of your keys and secrets. In the unlikely event of a prolonged region failure, Microsoft might initiate a regional failover of your key vault.
121+
122+
The following regions don't support Microsoft-managed replication or failover across regions:
123+
- Brazil South
124+
- Brazil Southeast
125+
- West US 3
126+
- Any region that doesn't have a paired region
127+
128+
> [!IMPORTANT]
129+
> Microsoft triggers Microsoft-managed failover. It's likely to occur after a significant delay and is done on a best-effort basis. There are also some exceptions to this process. The failover of key vaults might occur at a time that's different from the failover time of other Azure services.
130+
>
131+
> If you need to be resilient to region outages, consider using one of the [alternative multi-region approaches](#alternative-multi-region-approaches).
132+
133+
For detailed information about how Key Vault replicates data across regions, see [Data replication](/azure/key-vault/general/disaster-recovery-guidance#data-replication) in the Key Vault availability and redundancy guide.
134+
135+
#### Considerations
136+
137+
While the failover is in progress, your key vault might be unavailable for a few minutes. After failover has completed, your key vault operates in read-only mode with limited operations supported. You can't change key vault properties while operating in the secondary region, and access policy and firewall configurations can't be modified while operating in the secondary region.
138+
139+
#### Cost
140+
141+
There are no additional costs for the built-in multi-region replication capabilities of Azure Key Vault.
142+
143+
#### Normal operations
144+
145+
The following section describes what to expect when a key vault is located in a region that supports Microsoft-managed replication and failover, and the primary region is operational:
146+
147+
- **Traffic routing between regions:** During normal operations, all requests are routed to the primary region where your key vault is deployed.
148+
149+
- **Data replication between regions:** Key Vault replicates data asynchronously to the paired region. When you make changes to your key vault contents, those changes are first committed to the primary region and then replicated to the secondary region.
150+
151+
#### Region-down experience
152+
153+
The following section describes what to expect when a key vault is located in a region that supports Microsoft-managed replication and failover, and there's an outage in the primary region:
154+
155+
- **Detection and response:** Microsoft can decide to perform a failover if the primary region is lost. This process can take several hours after the loss of the primary region, or even longer in some scenarios. Failover of key vaults might not occur at the same time as other Azure services.
156+
157+
- **Notification:** You can monitor the status of your key vault through Azure Resource Health and Azure Service Health notifications.
158+
159+
- **Active requests:** During a region failover, active requests might fail and need to be retried by client applications after failover completes.
160+
161+
- **Expected data loss:** There might be some data loss if changes haven't been replicated to the secondary region before the primary region fails.
162+
163+
- **Expected downtime:** During a major outage of the primary region, your key vault might be unavailable for several hours or until Microsoft initiates failover to the secondary region.
164+
165+
- **Traffic rerouting:** After a region failover is completed, requests are automatically routed to the paired region without requiring any customer intervention.
166+
167+
For a complete description of the failover process and behavior, see [Failover across regions](/azure/key-vault/general/disaster-recovery-guidance#failover-across-regions) in the Key Vault availability and redundancy guide.
168+
169+
### Alternative multi-region approaches
170+
171+
There are situations where the Microsoft-managed cross-region failover capabilities of Azure Key Vault aren't suitable, such as:
172+
173+
- Your key vault is in a nonpaired region.
174+
- Your key vault is in a paired region that doesn't support Microsoft-managed cross-region replication and failover (Brazil South, Brazil Southeast, West US 3).
175+
- Your business uptime goals aren't satisfied by the recovery time or data loss that Microsoft-managed cross-region failover provides.
176+
- You need to fail over to a region that isn't your primary region's pair.
177+
178+
You can design a custom cross-region failover solution. One approach is to:
179+
180+
1. Create separate key vaults in different regions.
181+
1. Use the backup and restore functionality to maintain consistent secrets across regions.
182+
1. Implement application-level logic to fail over between key vaults.
183+
184+
## Backups
185+
186+
Azure Key Vault provides the ability to back up and restore individual secrets, keys, and certificates. Backups are intended to provide you with an offline copy of your secrets in the unlikely event that you lose access to your key vault.
187+
188+
Key points about the backup functionality:
189+
190+
- Backups create encrypted blobs that can't be decrypted outside of Azure.
191+
- Backups can only be restored to a key vault within the same Azure subscription and Azure geography.
192+
- There's a limitation of backing up no more than 500 past versions of a key, secret, or certificate object.
193+
- Backups are point-in-time snapshots and don't automatically update when secrets change.
194+
195+
For most solutions, you shouldn't rely exclusively on backups. Instead, use the other capabilities described in this guide to support your resiliency requirements. However, backups protect against some risks that other approaches don't, such as accidental deletion of specific secrets.
196+
197+
For detailed instructions, guidance on when to use backups, and important limitations, see [Azure Key Vault backup](/azure/key-vault/general/backup).
198+
199+
## Recovery features
200+
201+
Azure Key Vault provides two key recovery features to prevent accidental or malicious deletion:
202+
203+
- **Soft delete:** When enabled, soft delete allows you to recover deleted vaults and objects during a configurable retention period (default 90 days). Think of soft delete like a recycle bin for your key vault resources.
204+
205+
- **Purge protection:** When enabled, purge protection prevents permanent deletion of your key vault and its objects until the retention period elapses. This prevents malicious actors from permanently destroying your secrets.
206+
207+
Both features are strongly recommended for production environments. For a detailed explanation of these features, see [What are soft-delete and purge protection](/azure/key-vault/general/key-vault-recovery#what-are-soft-delete-and-purge-protection) in the Key Vault recovery management documentation.
208+
209+
## Service-level agreement
210+
211+
The service-level agreement (SLA) for Azure Key Vault describes the expected availability of the service, and the conditions that must be met to achieve that availability expectation. For more information, see the [SLAs for online services](https://www.microsoft.com/licensing/docs/view/Service-Level-Agreements-SLA-for-Online-Services).
212+
213+
## Related content
214+
- [Azure Key Vault availability and redundancy](/azure/key-vault/general/disaster-recovery-guidance)
215+
- [Azure Key Vault backup](/azure/key-vault/general/backup)
216+
- [Azure Key Vault recovery management](/azure/key-vault/general/key-vault-recovery)
217+
- [Reliability in Azure](/azure/reliability/overview)

0 commit comments

Comments
 (0)