You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/ai-services/openai/how-to/business-continuity-disaster-recovery.md
+16-16Lines changed: 16 additions & 16 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,7 +6,7 @@ description: Considerations for implementing Business Continuity and Disaster Re
6
6
manager: nitinme
7
7
ms.service: azure-ai-openai
8
8
ms.topic: how-to
9
-
ms.date: 9/05/2024
9
+
ms.date: 12/03/2024
10
10
author: mrbullwinkle
11
11
ms.author: mbullwin
12
12
recommendations: false
@@ -22,14 +22,14 @@ By default, the Azure OpenAI service provides a [default SLA](https://www.micros
22
22
23
23
## Standard Deployments
24
24
25
-
1. For Standard Deployments (PayGo) default to Data Zone deployment (US/EU options).
25
+
1. For Standard Deployments default to Data Zone deployment (US/EU options).
26
26
- If you can use Global Standard deployments, you should. Data Zone deployments are the next best option for organizations requiring data processing to happen entirely within a geographic boundary.
27
27
1. You should deploy two Azure OpenAI Service resources in the Azure Subscription. One resource should be deployed in your preferred region and the other should be deployed in your secondary/failover region. The Azure OpenAI service allocates quota at the subscription + region level, so they can live in the same subscription with no impact on quota.
28
-
1. You should have one deployment for each model you plan to use deployed to the Azure OpenAI Service resource in your preferred Azure region and you should duplicate these model deployments in the secondary/failover region. Allocate the full quota available in your Standard deployment to each of these endpoints. This will provide the highest throughput rate when compared to splitting quota across multiple deployments.
29
-
1. Select the deployment region based on your network topology. Note that you can deploy an Azure OpenAI Service resource to any supported region and then create a Private Endpoint for that resource in your preferred region.
30
-
- Once within the Azure OpenAI Service boundary, the Azure OpenAI Service will optimize routing and processing across available compute in the data zone.
28
+
1. You should have one deployment for each model you plan to use deployed to the Azure OpenAI Service resource in your preferred Azure region and you should duplicate these model deployments in the secondary/failover region. Allocate the full quota available in your Standard deployment to each of these endpoints. This provides the highest throughput rate when compared to splitting quota across multiple deployments.
29
+
1. Select the deployment region based on your network topology. You can deploy an Azure OpenAI Service resource to any supported region and then create a Private Endpoint for that resource in your preferred region.
30
+
- Once within the Azure OpenAI Service boundary, the Azure OpenAI Service optimizes routing and processing across available compute in the data zone.
31
31
- Using data zones is more efficient and simpler than self-managed load balancing across multiple regional deployments.
32
-
1.In case of a regional outage where the deployment is in an unusable state, you can use the other deployment in the secondary/passive region within the same subscription.
32
+
1.If there's a regional outage where the deployment is in an unusable state, you can use the other deployment in the secondary/passive region within the same subscription.
33
33
- Because both the primary and secondary deployments are Zone deployments, they draw from the same Zone capacity pool which draws from all available regions in the Zone. The secondary deployment is protecting against the primary Azure OpenAI endpoint being unreachable.
34
34
- Use a Generative AI Gateway that supports load balancing and circuit breaker pattern such as API Management in front of the Azure OpenAI Service endpoints so disruption during a regional outage is minimized to consuming applications.
35
35
- If the quota within a given subscription is exhausted, a new subscription can be deployed in the same manner as above and its endpoint deployed behind the Generative AI Gateway.
@@ -39,21 +39,21 @@ By default, the Azure OpenAI service provides a [default SLA](https://www.micros
39
39
### Create an Enterprise PTU Pool
40
40
41
41
1. For provisioned deployments, we recommend having a single Data Zone PTU deployment (available 12/04/2024) that serves as an enterprise pool of PTU. You can use API Management to manage traffic from multiple applications to set throughput limits, logging, priority, and failover logic.
42
-
- Think of this Enterprise PTU Pool as a “Private Paygo ” resource that protects against the noisy-neighbors problem that can occur on Standard deployments when service demand is high. Your organization will have guaranteed, dedicated access to a pool of capacity that is only available to you and therefore independent of demand spikes from other customers.
42
+
- Think of this Enterprise PTU Pool as a “Private pay-as-you-go ” resource that protects against the noisy-neighbors problem that can occur on Standard deployments when service demand is high. Your organization will have guaranteed, dedicated access to a pool of capacity that is only available to you and therefore independent of demand spikes from other customers.
43
43
- This gives you control over which applications experience increases in latency first, allowing you to prioritize traffic to your mission critical applications.
44
-
- Provisioned Deployments are backed by latency SLA’s that make them preferable to Standard (paygo) deployments for latency sensitive workloads.
45
-
- Enterprise PTU Deployment also enables higher utilization rates as traffic is smoothed out across application workloads, whereas individual workloads tend to be spikier.
46
-
1. Your primary Enterprise PTU deployment should be in a different region than your primary Standard Zone deployment. This is so that in case of a regional outage, you do not lose access to both your PTU deployment and Standard Zone deployment at the same time.
44
+
- Provisioned Deployments are backed by latency SLAs that make them preferable to Standard (pay-as-you-go) deployments for latency sensitive workloads.
45
+
- Enterprise PTU Deployment also enables higher utilization rates as traffic is smoothed out across application workloads, whereas individual workloads tend to be more prone to spikes.
46
+
1. Your primary Enterprise PTU deployment should be in a different region than your primary Standard Zone deployment. This is so that if there's a regional outage, you don't lose access to both your PTU deployment and Standard Zone deployment at the same time.
47
47
48
48
### Workload Dedicated PTU Deployment
49
49
50
-
1. Certain workloads may desire to have their own dedicated provisioned deployment. If this is the case, you can create a dedicated PTU deployment for that application.
50
+
1. Certain workloads may need to have their own dedicated provisioned deployment. If so, you can create a dedicated PTU deployment for that application.
51
51
1. The workload and enterprise PTU pool deployments should protect against regional failures. You could do this by placing the workload PTU pool in Region A and the enterprise PTU pool in Region B.
52
52
1. This deployment should failover first to the Enterprise PTU Pool and then to the Standard deployment. This implies that when utilization of the workload PTU deployment exceeds 100%, requests would still be serviced by PTU endpoints, enabling a higher latency SLA for that application.
53
53
54
54
{bcdr_diagram_one}
55
55
56
-
The additional benefit of this architecture is that it allows you to stack Standard deployments with Provisioned Deployments so that you can dial in your preferred level of performance and resiliency. This allows you to use PTU for your baseline demand across workloads and leverage paygo for spikes in traffic.
56
+
The additional benefit of this architecture is that it allows you to stack Standard deployments with Provisioned Deployments so that you can dial in your preferred level of performance and resiliency. This allows you to use PTU for your baseline demand across workloads and leverage pay-as-you-go for spikes in traffic.
57
57
58
58
{bcdr_diagram_two}
59
59
@@ -65,7 +65,7 @@ The infrastructure that supports the Azure OpenAI architecture needs to be consi
65
65
66
66
Organizations consuming the service through the Microsoft public backbone should consider the following design elements:
67
67
68
-
1. The Generative AI Gateway should be deployed in manner that ensures it will be available in the event of an Azure regional outage. If using APIM (Azure API Management), this can be done by deploying separate APIM instances in multiple regions or using the [multi-region gateway feature of APIM](https://learn.microsoft.com/en-us/azure/api-management/api-management-howto-deploy-multi-region).
68
+
1. The Generative AI Gateway should be deployed in manner that ensures it's available in the event of an Azure regional outage. If using APIM (Azure API Management), this can be done by deploying separate APIM instances in multiple regions or using the [multi-region gateway feature of APIM](https://learn.microsoft.com/en-us/azure/api-management/api-management-howto-deploy-multi-region).
69
69
1. A public global server load balancer should be used to load balance across the multiple Generative AI Gateway instances in either an active/active or active/passive manner. [Azure FrontDoor](https://learn.microsoft.com/en-us/azure/architecture/web-apps/app-service/architectures/multi-region) or [Azure Traffic Manager](https://learn.microsoft.com/en-us/azure/traffic-manager/traffic-manager-routing-methods) can be used to fulfill this role depending on the organization’s requirements.
70
70
71
71
{bcdr_diagram_three}
@@ -75,6 +75,6 @@ Organizations consuming the service through the Microsoft public backbone should
75
75
Organizations consuming the service through a private network should consider the following design elements:
76
76
77
77
1. Hybrid connectivity should be deployed in a way that it protects against the failure of an Azure region. The underlining components supporting hybrid connectivity consist of the organization’s on-premises network infrastructure and Microsoft ExpressRoute (https://learn.microsoft.com/en-us/azure/expressroute/designing-for-high-availability-with-expressroute) or VPN (https://learn.microsoft.com/en-us/azure/vpn-gateway/vpn-gateway-highlyavailable).
78
-
1. The Generative AI Gateway should be deployed in manner that ensures it will be available in the event of an Azure regional outage. If using APIM (Azure API Management), this can be done by deploying separate APIM instances in multiple regions or using the multi-region gateway feature of APIM (https://learn.microsoft.com/en-us/azure/api-management/api-management-howto-deploy-multi-region).
79
-
1. Azure Private Link Private Endpoints should be deployed for each Azure OpenAI Service instance in each Azure region. For Azure Private DNS, a split-brain DNS approach can be used if all application access to the Azure OpenAI Service is done through the Generative AI Gateway to provide for additional protection against a regional failure. If this is not the case, Private DNS records will need to manually modified in the event of a loss of an Azure region (https://learn.microsoft.com/en-us/azure/azure-resource-manager/management/overview#what-location-should-i-use-for-my-resource-group),
80
-
1. A private global server load balancer should be used to load balance across the multiple Generative AI Gateway instances in either an active/active or active/passive manner. Azure does not have a native service for global server load balancer for workloads that require private DNS resolution (https://github.com/adstuart/azure-crossregion-private-lb). In lieu of a global server load balancer, organizations can use achieve an active/passive pattern through toggling the DNS record for the Generative AI Gateway.
78
+
1. The Generative AI Gateway should be deployed in manner that ensures it's available in the event of an Azure regional outage. If using APIM (Azure API Management), this can be done by deploying separate APIM instances in multiple regions or using the multi-region gateway feature of APIM (https://learn.microsoft.com/en-us/azure/api-management/api-management-howto-deploy-multi-region).
79
+
1. Azure Private Link Private Endpoints should be deployed for each Azure OpenAI Service instance in each Azure region. For Azure Private DNS, a split-brain DNS approach can be used if all application access to the Azure OpenAI Service is done through the Generative AI Gateway to provide for additional protection against a regional failure. If not, Private DNS records will need to be manually modified in the event of a loss of an Azure region (https://learn.microsoft.com/en-us/azure/azure-resource-manager/management/overview#what-location-should-i-use-for-my-resource-group),
80
+
1. A private global server load balancer should be used to load balance across the multiple Generative AI Gateway instances in either an active/active or active/passive manner. Azure doesn't have a native service for global server load balancer for workloads that require private DNS resolution (https://github.com/adstuart/azure-crossregion-private-lb). In lieu of a global server load balancer, organizations can achieve an active/passive pattern through toggling the DNS record for the Generative AI Gateway.
0 commit comments