Skip to content

Commit 09987fe

Browse files
authored
Merge pull request #248682 from mrbullwinkle/mrb_08_17_2023_BCDR
[Azure AI} [Azure OpenAI] update BCDR
2 parents 6149780 + 248e3a0 commit 09987fe

File tree

1 file changed

+17
-25
lines changed

1 file changed

+17
-25
lines changed

articles/ai-services/openai/how-to/business-continuity-disaster-recovery.md

Lines changed: 17 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ manager: nitinme
77
ms.service: cognitive-services
88
ms.subservice: openai
99
ms.topic: how-to
10-
ms.date: 6/21/2023
10+
ms.date: 8/17/2023
1111
author: mrbullwinkle
1212
ms.author: mbullwin
1313
recommendations: false
@@ -17,44 +17,36 @@ keywords:
1717

1818
# Business Continuity and Disaster Recovery (BCDR) considerations with Azure OpenAI Service
1919

20-
Azure OpenAI is available in multiple regions. Since subscription keys are region bound, when a customer acquires a key, they select the region in which their deployments will reside and from then on, all operations stay associated with that Azure server region.
20+
Azure OpenAI is available in multiple regions. When you create an Azure OpenAI resource, you specify a region. From then on, your resource and all its operations stay associated with that Azure server region.
2121

22-
It's rare, but not impossible, to encounter a network issue that hits an entire region. If your service needs to always be available, then you should design it to either fail-over into another region or split the workload between two or more regions. Both approaches require at least two Azure OpenAI resources in different regions. This article provides general recommendations for how to implement Business Continuity and Disaster Recovery (BCDR) for your Azure OpenAI applications.
22+
It's rare, but not impossible, to encounter a network issue that hits an entire region. If your service needs to always be available, then you should design it to either failover into another region or split the workload between two or more regions. Both approaches require at least two Azure OpenAI resources in different regions. This article provides general recommendations for how to implement Business Continuity and Disaster Recovery (BCDR) for your Azure OpenAI applications.
2323

24-
## Best practices
25-
26-
Today customers will call the endpoint provided during deployment for both deployments and inference. These operations are stateless, so no data is lost in the case that a region becomes unavailable.
27-
28-
If a region is non-operational customers must take steps to ensure service continuity.
24+
## BCDR requires custom code
2925

30-
## Business continuity
26+
Today customers will call the endpoint provided during deployment for inferencing. Inferencing operations are stateless, so no data is lost if a region becomes unavailable.
3127

32-
The following set of instructions applies both customers using default endpoints and those using custom endpoints.
28+
If a region is nonoperational customers must take steps to ensure service continuity.
3329

34-
### Default endpoint recovery
30+
## BCDR for base model & customized model
3531

36-
If you're using a default endpoint, you should configure your client code to monitor errors, and if the errors persist, be prepared to redirect to another region of your choice where you have an Azure OpenAI subscription.
32+
If you're using the base models, you should configure your client code to monitor errors, and if the errors persist, be prepared to redirect to another region of your choice where you have an Azure OpenAI subscription.
3733

3834
Follow these steps to configure your client to monitor errors:
3935

40-
1. Use the [models page](../concepts/models.md) to identify the list of available regions for Azure OpenAI.
36+
1. Use the [models](/azure/ai-services/openai/concepts/models#model-summary-table-and-region-availability) page to choose the datacenters and regions that are right for you.
4137

42-
2. Select a primary and one secondary/backup regions from the list.
38+
2. Select a primary and one (or more) secondary/backup regions from the list.
4339

44-
3. Create Azure OpenAI resources for each region selected.
40+
3. Create Azure OpenAI resources for each region(s) selected.
4541

4642
4. For the primary region and any backup regions your code will need to know:
4743

48-
a. Base URI for the resource
49-
50-
b. Regional access key or Azure Active Directory access
44+
- Base URI for the resource
45+
- Regional access key or Azure Active Directory access
5146

52-
5. Configure your code so that you monitor connectivity errors (typically connection timeouts and service unavailability errors).
47+
5. Configure your code so that you monitor connectivity errors (typically connection timeouts and service unavailability errors).
5348

54-
a. Given that networks yield transient errors, for single connectivity issue occurrences, the suggestion is to retry.
55-
56-
b. For persistence redirect traffic to the backup resource in the region you've created.
57-
58-
## BCDR requires custom code
49+
- Given that networks yield transient errors, for single connectivity issue occurrences, the suggestion is to retry.
50+
- For persistent connectivity issues, redirect traffic to the backup resource in the region(s) you've created.
5951

60-
The recovery from regional failures for this usage type can be performed instantaneously and at a very low cost. This does however, require custom development of this functionality on the client side of your application.
52+
If you have fine-tuned a model in your primary region, you will need to retrain the base model in the secondary region(s) using the same training data. And then follow the above steps.

0 commit comments

Comments
 (0)