|
| 1 | +--- |
| 2 | +title: Reliability in Azure Key Vault |
| 3 | +description: Find out about reliability in Azure Key Vault, including availability zones and multi-region deployments. |
| 4 | +author: anaharris-ms |
| 5 | +ms.author: anaharris |
| 6 | +ms.topic: reliability-article |
| 7 | +ms.custom: subject-reliability, references_regions |
| 8 | +ms.service: key-vault |
| 9 | +ms.date: 11/12/2024 |
| 10 | +#Customer intent: As an engineer responsible for business continuity, I want to understand the details of how Azure Key Vault works from a reliability perspective and plan disaster recovery strategies in alignment with the exact processes that Azure services follow during different kinds of situations. |
| 11 | +--- |
| 12 | + |
| 13 | +# Reliability in Azure Key Vault |
| 14 | + |
| 15 | +> This article describes reliability support in Azure Key Vault, covering intra-regional resiliency via [availability zones](#availability-zone-support) and [multi-region deployments](#multi-region-support). |
| 16 | +> |
| 17 | +> Resiliency is a shared responsibility between you and Microsoft and so this article also covers ways for you to create a resilient solution that meets your needs. |
| 18 | +
|
| 19 | +Azure Key Vault is a cloud service that provides a secure store for secrets, such as keys, passwords, certificates, and other sensitive information. Key Vault offers a range of built-in reliability features to help ensure that your secrets remain available, including automatic region replication, data redundancy, and the ability to back up and restore your secrets. |
| 20 | + |
| 21 | +## Production deployment recommendations |
| 22 | + |
| 23 | +For production deployments of Azure Key Vault, we recommend: |
| 24 | + |
| 25 | +- Using Standard or Premium tier key vaults |
| 26 | +- Enabling soft delete and purge protection to prevent accidental or malicious deletion |
| 27 | +- For critical workloads, consider implementing multi-region strategies as described in this guide |
| 28 | + |
| 29 | +## Reliability architecture overview |
| 30 | + |
| 31 | +Azure Key Vault achieves redundancy by replicating your key vault and its contents within the region to ensure high durability and availability of your keys, secrets, and certificates. |
| 32 | + |
| 33 | +By default, the contents of your key vault are replicated both within the region and to a paired region located at least 150 miles away but within the same geography. This approach ensures high durability of your keys and secrets, protecting against hardware failures, network outages, or localized disasters. |
| 34 | + |
| 35 | +Key Vault provides multiple layers of redundancy to maintain availability during: |
| 36 | +- Hardware failures |
| 37 | +- Network outages |
| 38 | +- Localized disasters |
| 39 | +- Maintenance activities |
| 40 | + |
| 41 | +## Transient faults |
| 42 | + |
| 43 | + [!INCLUDE [Transient fault description](includes/reliability-transient-fault-description-include.md)] |
| 44 | + |
| 45 | +Transient faults are temporary errors that occur in distributed systems like Azure. They can happen during network connectivity issues, service reconfiguration, or temporary service unavailability. |
| 46 | + |
| 47 | +Azure Key Vault is designed to handle most transient errors automatically. However, client applications should implement retry logic when interacting with Key Vault to handle any transient failures that might occur. Some best practices include: |
| 48 | + |
| 49 | +- Implement exponential backoff retry policies in your client applications |
| 50 | +- Use the Azure SDK libraries which typically include built-in retry mechanisms |
| 51 | +- Monitor for throttling errors, as exceeding Key Vault service limits will cause throttling |
| 52 | + |
| 53 | +If you're using Key Vault in high-throughput scenarios, consider distributing your operations across multiple key vaults to avoid throttling limits. |
| 54 | + |
| 55 | +## Availability zone support |
| 56 | + |
| 57 | + [!INCLUDE [AZ support description](includes/reliability-availability-zone-description-include.md)] |
| 58 | + |
| 59 | +Azure availability zones are physically separate locations within an Azure region. Each availability zone is made up of one or more datacenters equipped with independent power, cooling, and networking infrastructure. |
| 60 | + |
| 61 | +### Region support |
| 62 | + |
| 63 | +Azure Key Vault is available in all Azure regions that support availability zones. Key Vault automatically leverages the availability zone infrastructure where available, providing high availability within a region. |
| 64 | + |
| 65 | +Key Vault is designed to be resilient to zone failures without any specific configuration required by customers. The service automatically manages the redundancy across availability zones in regions where zones are available. |
| 66 | + |
| 67 | +### Requirements |
| 68 | + |
| 69 | +All Key Vault SKUs (Standard and Premium) support the same level of availability and resiliency. There are no specific tier requirements to achieve zone resilience with Azure Key Vault. |
| 70 | + |
| 71 | +### Considerations |
| 72 | + |
| 73 | +While Azure Key Vault is resilient to zone failures, certain aspects should be considered: |
| 74 | + |
| 75 | +- During a zone failure, some write operations might be temporarily unavailable |
| 76 | +- Read operations typically remain available during zone failures |
| 77 | +- You should monitor your key vault's availability using Azure Monitor metrics and alerts |
| 78 | + |
| 79 | +### Cost |
| 80 | + |
| 81 | +There are no additional costs associated with Key Vault's zone resilience. The pricing is based on the SKU (Standard or Premium) and the number of operations performed. |
| 82 | + |
| 83 | +### Normal operations |
| 84 | + |
| 85 | +- **Traffic routing between zones:** Azure Key Vault automatically manages traffic routing between availability zones. During normal operations, requests are distributed across zones transparently. |
| 86 | + |
| 87 | +- **Data replication between zones:** Key Vault data is synchronously replicated across availability zones in regions that support zones. This ensures that your keys, secrets, and certificates remain consistent and available even if a zone becomes unavailable. |
| 88 | + |
| 89 | +### Zone-down experience |
| 90 | + |
| 91 | +- **Detection and response:** The Key Vault service is responsible for detecting zone failures and automatically responding to them. You don't need to take any action during a zone failure. |
| 92 | + |
| 93 | +- **Notification:** You can monitor the status of your key vault through Azure Resource Health and Azure Service Health. These services provide notifications about any service degradation. |
| 94 | + |
| 95 | +- **Active requests:** During a zone failure, any in-flight requests to the affected zone might fail and need to be retried by client applications. |
| 96 | + |
| 97 | +- **Expected data loss:** No data loss is expected during a zone failure due to the synchronous replication between zones. |
| 98 | + |
| 99 | +- **Expected downtime:** For read operations, there should be minimal to no downtime during a zone failure. Write operations might experience temporary unavailability while the service adjusts to the zone failure. |
| 100 | + |
| 101 | +- **Traffic rerouting:** Key Vault automatically reroutes traffic away from the affected zone to healthy zones without requiring any customer intervention. |
| 102 | + |
| 103 | +### Failback |
| 104 | + |
| 105 | +When the affected availability zone recovers, Azure Key Vault automatically restores operations to that zone. This process is fully managed by the Azure platform and doesn't require any customer intervention. |
| 106 | + |
| 107 | +## Multi-region support |
| 108 | + |
| 109 | +Azure Key Vault provides built-in support for replicating your key vault and its contents to a secondary region. This feature is useful for disaster recovery and ensuring high availability of your secrets. |
| 110 | + |
| 111 | +### Data replication |
| 112 | + |
| 113 | +For most Azure regions that are paired with another region, the contents of your key vault are replicated both within the region and to the paired region. The paired region is typically at least 150 miles away, but within the same geography. This approach ensures high durability of your keys and secrets. |
| 114 | + |
| 115 | +Exceptions to cross-region replication include: |
| 116 | +- Brazil South region |
| 117 | +- Brazil Southeast region |
| 118 | +- West US 3 region |
| 119 | + |
| 120 | +When you create key vaults in these regions, they aren't replicated across regions. |
| 121 | + |
| 122 | +### Region support |
| 123 | + |
| 124 | +Key Vault's multi-region capabilities depend on Azure region pairs. The replication is only supported between designated paired regions. For more information about Azure region pairs, see [Azure paired regions](/azure/reliability/cross-region-replication-azure). |
| 125 | + |
| 126 | +### Requirements |
| 127 | + |
| 128 | +There are no additional requirements to enable multi-region replication for Key Vault. It's a built-in feature of the service for supported regions. |
| 129 | + |
| 130 | +### Considerations |
| 131 | + |
| 132 | +- Key vaults in Brazil South, Brazil Southeast, and West US 3 don't have cross-region replication |
| 133 | +- During failover, your key vault is in read-only mode with limited operations supported |
| 134 | +- You can't change key vault properties during failover |
| 135 | +- Access policy and firewall configurations can't be modified during failover |
| 136 | + |
| 137 | +### Cost |
| 138 | + |
| 139 | +There are no additional costs for the built-in multi-region replication capabilities of Azure Key Vault. |
| 140 | + |
| 141 | +### Normal operations |
| 142 | + |
| 143 | +- **Traffic routing between regions:** During normal operations, all requests are routed to the primary region where your key vault is deployed. |
| 144 | + |
| 145 | +- **Data replication between regions:** Key Vault replicates data asynchronously to the paired region. When you make changes to your key vault contents, those changes are first committed to the primary region and then replicated to the secondary region. |
| 146 | + |
| 147 | +### Region-down experience |
| 148 | + |
| 149 | +- **Detection and response:** The Key Vault service is responsible for detecting a region failure and automatically failing over to the secondary region. |
| 150 | + |
| 151 | +- **Notification:** You can monitor the status of your key vault through Azure Resource Health and Azure Service Health notifications. |
| 152 | + |
| 153 | +- **Active requests:** During a region failover, active requests might fail and need to be retried by client applications. |
| 154 | + |
| 155 | +- **Expected data loss:** There might be some data loss if changes haven't been replicated to the secondary region before the primary region fails. |
| 156 | + |
| 157 | +- **Expected downtime:** Your key vault might be unavailable for a few minutes during the failover process. |
| 158 | + |
| 159 | +- **Traffic rerouting:** In the event of a region failover, requests are automatically routed to the paired region without requiring any customer intervention. |
| 160 | + |
| 161 | +### Failback |
| 162 | + |
| 163 | +When the primary region becomes available again, Azure Key Vault automatically fails back operations to that region. This process is fully managed by the Azure platform and doesn't require any customer intervention. |
| 164 | + |
| 165 | +During failback, all request types (including read and write requests) become available again once the process is complete. |
| 166 | + |
| 167 | +### Alternative multi-region approaches |
| 168 | + |
| 169 | +If you need a multi-region strategy for regions that don't support cross-region replication (Brazil South, Brazil Southeast, West US 3) or need more control over your multi-region deployment, consider: |
| 170 | + |
| 171 | +1. Creating separate key vaults in different regions |
| 172 | +2. Using the backup and restore functionality to maintain consistent secrets across regions |
| 173 | +3. Implementing application-level logic to failover between key vaults |
| 174 | + |
| 175 | +For example approaches to multi-region architectures, see: |
| 176 | +- [Highly available multi-region web application](/azure/architecture/web-apps/app-service/architectures/multi-region) |
| 177 | + |
| 178 | +## Backups |
| 179 | + |
| 180 | +Azure Key Vault provides the ability to back up and restore individual secrets, keys, and certificates. Backups are intended to provide you with an offline copy of your secrets in the unlikely event that you lose access to your key vault. |
| 181 | + |
| 182 | +Key points about the backup functionality: |
| 183 | + |
| 184 | +- Backups create encrypted blobs that can't be decrypted outside of Azure |
| 185 | +- Backups can only be restored to a key vault within the same Azure subscription and Azure geography |
| 186 | +- There's a limitation of backing up no more than 500 past versions of a key, secret, or certificate object |
| 187 | +- Backups are point-in-time snapshots and don't automatically update when secrets change |
| 188 | + |
| 189 | +> For most solutions, you shouldn't rely exclusively on backups. Instead, use the other capabilities described in this guide to support your resiliency requirements. However, backups protect against some risks that other approaches don't, such as accidental deletion of specific secrets. |
| 190 | +
|
| 191 | +For detailed instructions on how to back up and restore Key Vault objects, see [Azure Key Vault backup](/azure/key-vault/general/backup). |
| 192 | + |
| 193 | +## Recovery features |
| 194 | + |
| 195 | +Azure Key Vault provides two key recovery features to prevent accidental or malicious deletion: |
| 196 | + |
| 197 | +1. **Soft delete:** When enabled, soft delete allows you to recover deleted vaults and objects during a configurable retention period (default 90 days). Think of soft delete like a recycle bin for your key vault resources. |
| 198 | + |
| 199 | +2. **Purge protection:** When enabled, purge protection prevents permanent deletion of your key vault and its objects until the retention period elapses. This prevents malicious actors from permanently destroying your secrets. |
| 200 | + |
| 201 | +Both features are strongly recommended for production environments. For more information, see [Azure Key Vault recovery management with soft delete and purge protection](/azure/key-vault/general/key-vault-recovery). |
| 202 | + |
| 203 | +## Service-level agreement |
| 204 | + |
| 205 | +The service-level agreement (SLA) for Azure Key Vault describes the expected availability of the service, and the conditions that must be met to achieve that availability expectation. For more information, see [SLA for Azure Key Vault](https://azure.microsoft.com/support/legal/sla/key-vault/). |
| 206 | + |
| 207 | +## Related content |
| 208 | +- [Azure Key Vault availability and redundancy](/azure/key-vault/general/disaster-recovery-guidance) |
| 209 | +- [Azure Key Vault backup](/azure/key-vault/general/backup) |
| 210 | +- [Azure Key Vault recovery management](/azure/key-vault/general/key-vault-recovery) |
| 211 | +- [Reliability in Azure](/azure/reliability/overview) |
0 commit comments