You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/ai-services/openai/how-to/business-continuity-disaster-recovery.md
+17-25Lines changed: 17 additions & 25 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,7 +7,7 @@ manager: nitinme
7
7
ms.service: cognitive-services
8
8
ms.subservice: openai
9
9
ms.topic: how-to
10
-
ms.date: 6/21/2023
10
+
ms.date: 8/17/2023
11
11
author: mrbullwinkle
12
12
ms.author: mbullwin
13
13
recommendations: false
@@ -17,44 +17,36 @@ keywords:
17
17
18
18
# Business Continuity and Disaster Recovery (BCDR) considerations with Azure OpenAI Service
19
19
20
-
Azure OpenAI is available in multiple regions. Since subscription keys are region bound, when a customer acquires a key, they select the region in which their deployments will reside and from then on, all operations stay associated with that Azure server region.
20
+
Azure OpenAI is available in multiple regions. When you create an Azure OpenAI resource, you specify a region. From then on, your resource and all its operations stay associated with that Azure server region.
21
21
22
-
It's rare, but not impossible, to encounter a network issue that hits an entire region. If your service needs to always be available, then you should design it to either fail-over into another region or split the workload between two or more regions. Both approaches require at least two Azure OpenAI resources in different regions. This article provides general recommendations for how to implement Business Continuity and Disaster Recovery (BCDR) for your Azure OpenAI applications.
22
+
It's rare, but not impossible, to encounter a network issue that hits an entire region. If your service needs to always be available, then you should design it to either failover into another region or split the workload between two or more regions. Both approaches require at least two Azure OpenAI resources in different regions. This article provides general recommendations for how to implement Business Continuity and Disaster Recovery (BCDR) for your Azure OpenAI applications.
23
23
24
-
## Best practices
25
-
26
-
Today customers will call the endpoint provided during deployment for both deployments and inference. These operations are stateless, so no data is lost in the case that a region becomes unavailable.
27
-
28
-
If a region is non-operational customers must take steps to ensure service continuity.
24
+
## BCDR requires custom code
29
25
30
-
## Business continuity
26
+
Today customers will call the endpoint provided during deployment for inferencing. Inferencing operations are stateless, so no data is lost if a region becomes unavailable.
31
27
32
-
The following set of instructions applies both customers using default endpoints and those using custom endpoints.
28
+
If a region is nonoperational customers must take steps to ensure service continuity.
33
29
34
-
### Default endpoint recovery
30
+
##BCDR for base model & customized model
35
31
36
-
If you're using a default endpoint, you should configure your client code to monitor errors, and if the errors persist, be prepared to redirect to another region of your choice where you have an Azure OpenAI subscription.
32
+
If you're using the base models, you should configure your client code to monitor errors, and if the errors persist, be prepared to redirect to another region of your choice where you have an Azure OpenAI subscription.
37
33
38
34
Follow these steps to configure your client to monitor errors:
39
35
40
-
1. Use the [models page](../concepts/models.md)to identify the list of available regions for Azure OpenAI.
36
+
1. Use the [models](/azure/ai-services/openai/concepts/models#model-summary-table-and-region-availability) page to choose the datacenters and regions that are right for you.
41
37
42
-
2. Select a primary and one secondary/backup regions from the list.
38
+
2. Select a primary and one (or more) secondary/backup regions from the list.
43
39
44
-
3. Create Azure OpenAI resources for each region selected.
40
+
3. Create Azure OpenAI resources for each region(s) selected.
45
41
46
42
4. For the primary region and any backup regions your code will need to know:
47
43
48
-
a. Base URI for the resource
49
-
50
-
b. Regional access key or Azure Active Directory access
44
+
- Base URI for the resource
45
+
- Regional access key or Azure Active Directory access
51
46
52
-
5. Configure your code so that you monitor connectivity errors (typically connection timeouts and service unavailability errors).
47
+
5. Configure your code so that you monitor connectivity errors (typically connection timeouts and service unavailability errors).
53
48
54
-
a. Given that networks yield transient errors, for single connectivity issue occurrences, the suggestion is to retry.
55
-
56
-
b. For persistence redirect traffic to the backup resource in the region you've created.
57
-
58
-
## BCDR requires custom code
49
+
- Given that networks yield transient errors, for single connectivity issue occurrences, the suggestion is to retry.
50
+
- For persistent connectivity issues, redirect traffic to the backup resource in the region(s) you've created.
59
51
60
-
The recovery from regional failures for this usage type can be performed instantaneously and at a very low cost. This does however, require custom development of this functionality on the client side of your application.
52
+
If you have fine-tuned a model in your primary region, you will need to retrain the base model in the secondary region(s) using the same training data. And then follow the above steps.
0 commit comments