|
| 1 | +--- |
| 2 | +title: Reliability in Azure Key Vault |
| 3 | +description: Find out about reliability in Azure Key Vault, including availability zones and multi-region deployments. |
| 4 | +author: msmbaldwin |
| 5 | +ms.author: mbaldwin |
| 6 | +ms.topic: reliability-article |
| 7 | +ms.custom: subject-reliability, references_regions |
| 8 | +ms.service: azure-key-vault |
| 9 | +ms.date: 06/20/2025 |
| 10 | +#Customer intent: As an engineer responsible for business continuity, I want to understand the details of how Azure Key Vault works from a reliability perspective and plan disaster recovery strategies in alignment with the exact processes that Azure services follow during different kinds of situations. |
| 11 | +--- |
| 12 | + |
| 13 | +# Reliability in Azure Key Vault |
| 14 | + |
| 15 | +This article describes reliability support in Azure Key Vault, covering intra-regional resiliency via [availability zones](#availability-zone-support) and [multi-region deployments](#multi-region-support). |
| 16 | + |
| 17 | +[!INCLUDE [Shared responsibility description](includes/reliability-shared-responsibility-include.md)] |
| 18 | + |
| 19 | +Azure Key Vault is a cloud service that provides a secure store for secrets, such as keys, passwords, certificates, and other sensitive information. Key Vault offers a range of built-in reliability features to help ensure that your secrets remain available, including automatic region replication, data redundancy, and the ability to back up and restore your secrets. |
| 20 | + |
| 21 | +## Production deployment recommendations |
| 22 | + |
| 23 | +For production deployments of Azure Key Vault, we recommend that you: |
| 24 | + |
| 25 | +- Use Standard or Premium tier key vaults |
| 26 | +- Enable soft delete and purge protection to prevent accidental or malicious deletion |
| 27 | +- For critical workloads, consider implementing multi-region strategies as described in this guide |
| 28 | + |
| 29 | +## Reliability architecture overview |
| 30 | + |
| 31 | +To ensure high durability and availability of your keys, secrets, and certificates in the event of a hardware failure or network outage, Key Vault provides multiple layers of redundancy to maintain availability during: |
| 32 | + |
| 33 | +- Hardware failures |
| 34 | +- Network outages |
| 35 | +- Localized disasters |
| 36 | +- Maintenance activities |
| 37 | + |
| 38 | +By default, Azure Key Vault achieves redundancy by replicating your key vault and its contents within the region. |
| 39 | + |
| 40 | +In addition, if the region has a [paired region](./regions-list.md) and that paired region is in the same geography as the primary region, the contents are also replicated to the paired region. This approach ensures high durability of your keys and secrets, protecting against hardware failures, network outages, or localized disasters. |
| 41 | +## Transient faults |
| 42 | + |
| 43 | +[!INCLUDE [Transient fault description](includes/reliability-transient-fault-description-include.md)] |
| 44 | + |
| 45 | +To handle any transient failures that might occur, your client applications should implement retry logic when interacting with Key Vault. Some best practices include: |
| 46 | + |
| 47 | +- Use the [Azure SDKs](https://azure.microsoft.com/downloads/), which typically include built-in retry mechanisms. |
| 48 | +- If your clients connect directly to Key Vault, implement exponential backoff retry policies. |
| 49 | +- Cache secrets in memory when possible to reduce direct requests to Key Vault. |
| 50 | +- Monitor for throttling errors, as exceeding Key Vault service limits will cause throttling. |
| 51 | + |
| 52 | +If you're using Key Vault in high-throughput scenarios, consider distributing your operations across multiple key vaults to avoid throttling limits. Azure Key Vault has specific guidance for these scenarios: |
| 53 | + |
| 54 | +- A high-throughput scenario is one that approaches or exceeds the [service limits](/azure/key-vault/general/service-limits) for Key Vault operations (for example, 200 operations per second for software-protected keys). |
| 55 | +- For high-throughput workloads, divide your Key Vault traffic among multiple vaults and different regions. |
| 56 | +- A subscription-wide limit for all transaction types is five times the individual key vault limit. |
| 57 | +- Use a separate vault for each security/availability domain (for example, if you have five apps in two regions, consider using 10 vaults). |
| 58 | +- For public-key operations such as encryption, wrapping, and verification, perform these operations locally by caching the public key material. |
| 59 | + |
| 60 | +For comprehensive throttling guidance, see [Azure Key Vault throttling guidance](/azure/key-vault/general/overview-throttling). |
| 61 | + |
| 62 | +## Availability zone support |
| 63 | + |
| 64 | +[!INCLUDE [AZ support description](includes/reliability-availability-zone-description-include.md)] |
| 65 | + |
| 66 | +Azure Key Vault automatically provides zone redundancy in [regions that support availability zones](./regions-list.md), providing high availability within a region without requiring any specific configuration. |
| 67 | + |
| 68 | + |
| 69 | +When an availability zone becomes unavailable, Azure Key Vault automatically redirects your requests to other healthy availability zones to ensure high availability. |
| 70 | + |
| 71 | +### Region support |
| 72 | + |
| 73 | +Azure Key Vault enables zone redundancy by default in [all Azure regions that support availability zones](./regions-list.md). |
| 74 | + |
| 75 | +### Requirements |
| 76 | + |
| 77 | +All Key Vault SKUs (Standard and Premium) support the same level of availability and resiliency. There aren't any tier-specific requirements to achieve zone resilience. |
| 78 | + |
| 79 | +### Cost |
| 80 | + |
| 81 | +There are no additional costs associated with Key Vault's zone redundancy. The pricing is based on the SKU (Standard or Premium) and the number of operations performed. |
| 82 | + |
| 83 | +### Normal operations |
| 84 | + |
| 85 | +The following section describes what to expect when key vaults are in a region with availability zones, and all availability zones are operational: |
| 86 | + |
| 87 | +- **Traffic routing between zones:** Azure Key Vault automatically manages traffic routing between availability zones. During normal operations, requests are distributed across zones transparently. |
| 88 | + |
| 89 | +- **Data replication between zones:** Key Vault data is synchronously replicated across availability zones in regions that support zones. This ensures that your keys, secrets, and certificates remain consistent and available even if a zone becomes unavailable. |
| 90 | + |
| 91 | +### Zone-down experience |
| 92 | + |
| 93 | +The following section describes what to expect when key vaults are in a region with availability zones, and one or more availability zones are unavailable: |
| 94 | + |
| 95 | +- **Detection and response:** The Key Vault service is responsible for detecting zone failures and automatically responding to them. You don't need to take any action during a zone failure. |
| 96 | + |
| 97 | +- **Notification:** You can monitor the status of your key vault through Azure Resource Health and Azure Service Health. These services provide notifications about any service degradation. |
| 98 | + |
| 99 | +- **Active requests:** During a zone failure, any in-flight requests to the affected zone might fail and need to be retried by client applications. Client applications should follow [transient fault handling practices](#transient-faults) to ensure they can retry requests in the event of a zone failure. |
| 100 | + |
| 101 | +- **Expected data loss:** No data loss is expected during a zone failure due to the synchronous replication between zones. |
| 102 | + |
| 103 | +- **Expected downtime:** For read operations, there should be minimal to no downtime during a zone failure. Write operations might experience temporary unavailability while the service adjusts to the zone failure. Read operations are expected to remain available during zone failures. |
| 104 | + |
| 105 | +- **Traffic rerouting:** Key Vault automatically reroutes traffic away from the affected zone to healthy zones without requiring any customer intervention. |
| 106 | + |
| 107 | + |
| 108 | +For more information on the zone-down experience, see [Failover within a region](/azure/key-vault/general/disaster-recovery-guidance#failover-within-a-region) in the Key Vault availability and redundancy documentation. |
| 109 | + |
| 110 | +### Failback |
| 111 | + |
| 112 | +When the affected availability zone recovers, Azure Key Vault automatically restores operations to that zone. This process is fully managed by the Azure platform and doesn't require any customer intervention. |
| 113 | + |
| 114 | +## Multi-region support |
| 115 | + |
| 116 | +Azure Key Vault resources are deployed into a single Azure region. If the region becomes unavailable, your key vault is also unavailable. However, there are approaches that you can use to help ensure resilience to region outages. These approaches depend on whether the key vault is in a paired or nonpaired region and on your specific requirements and configuration. |
| 117 | + |
| 118 | +### Microsoft-managed failover to a paired region |
| 119 | + |
| 120 | +Azure Key Vault supports Microsoft-managed replication and failover for key vaults in most paired regions. The contents of your key vault are automatically replicated both within the region and, asynchronously, to the paired region. This approach ensures high durability of your keys and secrets. In the unlikely event of a prolonged region failure, Microsoft might initiate a regional failover of your key vault. |
| 121 | + |
| 122 | +The following regions don't support Microsoft-managed replication or failover across regions: |
| 123 | +- Brazil South |
| 124 | +- Brazil Southeast |
| 125 | +- West US 3 |
| 126 | +- Any region that doesn't have a paired region |
| 127 | + |
| 128 | +> [!IMPORTANT] |
| 129 | +> Microsoft triggers Microsoft-managed failover. It's likely to occur after a significant delay and is done on a best-effort basis. There are also some exceptions to this process. The failover of key vaults might occur at a time that's different from the failover time of other Azure services. |
| 130 | +> |
| 131 | +> If you need to be resilient to region outages, consider using one of the [alternative multi-region approaches](#alternative-multi-region-approaches). |
| 132 | +
|
| 133 | +For detailed information about how Key Vault replicates data across regions, see [Data replication](/azure/key-vault/general/disaster-recovery-guidance#data-replication) in the Key Vault availability and redundancy guide. |
| 134 | + |
| 135 | +#### Considerations |
| 136 | + |
| 137 | +While the failover is in progress, your key vault might be unavailable for a few minutes. After failover has completed, your key vault operates in read-only mode with limited operations supported. You can't change key vault properties while operating in the secondary region, and access policy and firewall configurations can't be modified while operating in the secondary region. |
| 138 | + |
| 139 | +#### Cost |
| 140 | + |
| 141 | +There are no additional costs for the built-in multi-region replication capabilities of Azure Key Vault. |
| 142 | + |
| 143 | +#### Normal operations |
| 144 | + |
| 145 | +The following section describes what to expect when a key vault is located in a region that supports Microsoft-managed replication and failover, and the primary region is operational: |
| 146 | + |
| 147 | +- **Traffic routing between regions:** During normal operations, all requests are routed to the primary region where your key vault is deployed. |
| 148 | + |
| 149 | +- **Data replication between regions:** Key Vault replicates data asynchronously to the paired region. When you make changes to your key vault contents, those changes are first committed to the primary region and then replicated to the secondary region. |
| 150 | + |
| 151 | +#### Region-down experience |
| 152 | + |
| 153 | +The following section describes what to expect when a key vault is located in a region that supports Microsoft-managed replication and failover, and there's an outage in the primary region: |
| 154 | + |
| 155 | +- **Detection and response:** Microsoft can decide to perform a failover if the primary region is lost. This process can take several hours after the loss of the primary region, or even longer in some scenarios. Failover of key vaults might not occur at the same time as other Azure services. |
| 156 | + |
| 157 | +- **Notification:** You can monitor the status of your key vault through Azure Resource Health and Azure Service Health notifications. |
| 158 | + |
| 159 | +- **Active requests:** During a region failover, active requests might fail and need to be retried by client applications after failover completes. |
| 160 | + |
| 161 | +- **Expected data loss:** There might be some data loss if changes haven't been replicated to the secondary region before the primary region fails. |
| 162 | + |
| 163 | +- **Expected downtime:** During a major outage of the primary region, your key vault might be unavailable for several hours or until Microsoft initiates failover to the secondary region. |
| 164 | + |
| 165 | +- **Traffic rerouting:** After a region failover is completed, requests are automatically routed to the paired region without requiring any customer intervention. |
| 166 | + |
| 167 | +For a complete description of the failover process and behavior, see [Failover across regions](/azure/key-vault/general/disaster-recovery-guidance#failover-across-regions) in the Key Vault availability and redundancy guide. |
| 168 | + |
| 169 | +### Alternative multi-region approaches |
| 170 | + |
| 171 | +There are situations where the Microsoft-managed cross-region failover capabilities of Azure Key Vault aren't suitable, such as: |
| 172 | + |
| 173 | +- Your key vault is in a nonpaired region. |
| 174 | +- Your key vault is in a paired region that doesn't support Microsoft-managed cross-region replication and failover (Brazil South, Brazil Southeast, West US 3). |
| 175 | +- Your business uptime goals aren't satisfied by the recovery time or data loss that Microsoft-managed cross-region failover provides. |
| 176 | +- You need to fail over to a region that isn't your primary region's pair. |
| 177 | + |
| 178 | +You can design a custom cross-region failover solution. One approach is to: |
| 179 | + |
| 180 | +1. Create separate key vaults in different regions. |
| 181 | +1. Use the backup and restore functionality to maintain consistent secrets across regions. |
| 182 | +1. Implement application-level logic to fail over between key vaults. |
| 183 | + |
| 184 | +## Backups |
| 185 | + |
| 186 | +Azure Key Vault provides the ability to back up and restore individual secrets, keys, and certificates. Backups are intended to provide you with an offline copy of your secrets in the unlikely event that you lose access to your key vault. |
| 187 | + |
| 188 | +Key points about the backup functionality: |
| 189 | + |
| 190 | +- Backups create encrypted blobs that can't be decrypted outside of Azure. |
| 191 | +- Backups can only be restored to a key vault within the same Azure subscription and Azure geography. |
| 192 | +- There's a limitation of backing up no more than 500 past versions of a key, secret, or certificate object. |
| 193 | +- Backups are point-in-time snapshots and don't automatically update when secrets change. |
| 194 | + |
| 195 | +For most solutions, you shouldn't rely exclusively on backups. Instead, use the other capabilities described in this guide to support your resiliency requirements. However, backups protect against some risks that other approaches don't, such as accidental deletion of specific secrets. |
| 196 | + |
| 197 | +For detailed instructions, guidance on when to use backups, and important limitations, see [Azure Key Vault backup](/azure/key-vault/general/backup). |
| 198 | + |
| 199 | +## Recovery features |
| 200 | + |
| 201 | +Azure Key Vault provides two key recovery features to prevent accidental or malicious deletion: |
| 202 | + |
| 203 | +- **Soft delete:** When enabled, soft delete allows you to recover deleted vaults and objects during a configurable retention period (default 90 days). Think of soft delete like a recycle bin for your key vault resources. |
| 204 | + |
| 205 | +- **Purge protection:** When enabled, purge protection prevents permanent deletion of your key vault and its objects until the retention period elapses. This prevents malicious actors from permanently destroying your secrets. |
| 206 | + |
| 207 | +Both features are strongly recommended for production environments. For a detailed explanation of these features, see [What are soft-delete and purge protection](/azure/key-vault/general/key-vault-recovery#what-are-soft-delete-and-purge-protection) in the Key Vault recovery management documentation. |
| 208 | + |
| 209 | +## Service-level agreement |
| 210 | + |
| 211 | +The service-level agreement (SLA) for Azure Key Vault describes the expected availability of the service, and the conditions that must be met to achieve that availability expectation. For more information, see the [SLAs for online services](https://www.microsoft.com/licensing/docs/view/Service-Level-Agreements-SLA-for-Online-Services). |
| 212 | + |
| 213 | +## Related content |
| 214 | +- [Azure Key Vault availability and redundancy](/azure/key-vault/general/disaster-recovery-guidance) |
| 215 | +- [Azure Key Vault backup](/azure/key-vault/general/backup) |
| 216 | +- [Azure Key Vault recovery management](/azure/key-vault/general/key-vault-recovery) |
| 217 | +- [Reliability in Azure](/azure/reliability/overview) |
0 commit comments