You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/energy-data-services/reliability-energy-data-services.md
+19-17Lines changed: 19 additions & 17 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -60,10 +60,7 @@ The "X" part should identify the product or service.
60
60
Required: Provide an introduction. Use the following placeholder as a suggestion, but elaborate.
61
61
-->
62
62
63
-
This article describes reliability support in Microsoft Energy Data Services, and covers regional resiliency with [availability zones](../reliability/reliability-functions?toc=%2Fazure%2Fazure-functions%2FTOC.json&tabs=azure-portal#availability-zone-support). For a more detailed overview of reliability in Azure, see [Azure reliability](https://docs.microsoft.com/azure/architecture/framework/resiliency/overview.md).
64
-
65
-
[Introduction]
66
-
TODO: Add your introduction
63
+
This article describes reliability support in Microsoft Energy Data Services, and covers regional resiliency with [availability zones](#availability-zone-support). For a more detailed overview of reliability in Azure, see [Azure reliability](https://docs.microsoft.com/azure/architecture/framework/resiliency/overview.md).
67
64
68
65
## Availability zone support
69
66
<!-- IF (AZ SUPPORTED) -->
@@ -95,14 +92,20 @@ N/A
95
92
<!-- END IF (SERVICE IS ZONAL) -->
96
93
97
94
### Fault tolerance
98
-
N/A
95
+
To prepare for availability zone failure, Microsoft Energy Data Services will over-provision capacity of service to ensure that the solution can tolerate ⅓ loss of capacity and continue to function without degraded performance during zone-wide outages.
To prepare for availability zone failure, customers should over-provision capacity of service to ensure that the solution can tolerate ⅓ loss of capacity and continue to function without degraded performance during zone-wide outages. Provide any information as to how customers should achieve this.
102
99
-->
103
100
104
101
### Zone down experience
105
-
In a zone-wide outage scenario, users should experience no impact on provisioned resources in a zone-redundant deployment. During a zone-wide outage , customers should be prepared to experience brief interruption for communication to provisioned resources; typically, this is manifested by client receiving 409 error code; this prompts re-try logic with appropriate intervals. New requests will be directed to healthy nodes with zero impact on user. During zone-wide outages, users will be able to create new offering resources and successfully scale existing ones.
102
+
- During a zone-wide outage, no action is required during zone recovery, Offering will self-heal and re-balance itself to take advantage of the healthy zone automatically.
103
+
104
+
- During a zone-wide outage, the customer should expect brief degradation of performance, until the service self-healing re-balances underlying capacity to adjust to healthy zones. This is not dependent on zone restoration; it is expected that the Microsoft-managed service self-healing state will compensate for a lost zone, leveraging capacity from other zones.
105
+
106
+
- In a zone-wide outage scenario, users should experience no impact on provisioned resources in a zone-redundant deployment. During a zone-wide outage , customers should be prepared to experience brief interruption for communication to provisioned resources; this prompts re-try logic with appropriate intervals. New requests will be directed to healthy nodes with zero impact on user. During zone-wide outages, users will be able to create new offering resources; however, there could be capacity constraints, due to which the underlying resources will be scaled on a best-effort basis.
107
+
108
+
- All Microsoft Energy Data Services APIs may need to be retried for 5XX errors.
106
109
107
110
<!-- IF (SERVICE IS ZONE REDUNDANT) -->
108
111
@@ -136,7 +139,7 @@ List the following:
136
139
<!-- END IF (SERVICE IS ZONE REDUNDANT) -->
137
140
138
141
#### Zone outage preparation and recovery
139
-
TODO: Add your zone outage preparation and recovery
142
+
<!--TODO: Add your zone outage preparation and recovery-->
140
143
141
144
<!-- 3G. Zone outage preparation and recovery ------------------------------------------
142
145
The table below lists alerts that can trigger an action to compensate for a loss of capacity or a state for your resources. It also provides information regarding actions for recovery, as well as how to prepare for such alerts prior to the outage.
@@ -146,7 +149,7 @@ The table below lists alerts that can trigger an action to compensate for a loss
146
149
-->
147
150
148
151
### Low-latency design
149
-
TODO: Add your low-latency design
152
+
Microsoft guarantees communication between zones of < 2ms and all underlying Microsoft Energy Data Services resources supports it.
@@ -161,30 +164,29 @@ TODO: Add your low-latency design
161
164
162
165
<!-- END IF (SERVICE IS ZONE REDUNDANT AND ZONAL) -->
163
166
164
-
>[!IMPORTANT]
165
-
>By opting out of zone-aware deployment, you forego protection from isolation of underlying faults. Use of SKUs that don't support availability zones or opting out from availability zone configuration forces reliance on resources that don't obey zone placement and separation (including underlying dependencies of these resources). These resources shouldn't be expected to survive zone-down scenarios. Solutions that leverage such resources should define a disaster recovery strategy and configure a recovery of the solution in another region.
If application safe deployment is not relevant for this resource type, explain why and how the service manages availability zones for the customer behind the scenes.
172
173
-->
173
174
174
-
When you opt for availability zones isolation, you should utilize safe deployment techniques for application code, as well as application upgrades. Describe techniques that the customer should use to target one-zone-at-a-time for deployment and upgrades (for example, virtual machine scale sets). If something is strictly recommended, call it out below.
175
+
<!--When you opt for availability zones isolation, you should utilize safe deployment techniques for application code, as well as application upgrades. Describe techniques that the customer should use to target one-zone-at-a-time for deployment and upgrades (for example, virtual machine scale sets). If something is strictly recommended, call it out below.-->
175
176
176
177
<!-- List health signals that the customer should monitor, before proceeding with upgrading next set of nodes in another zone, to contain a potential impact of an unhealthy deployment. -->
177
-
[Health signals]
178
-
TODO: Add your health signals
178
+
<!--[Health signals]-->
179
+
<!--TODO: Add your health signals-->
179
180
180
-
### Availability zone redeployment and migration
181
-
TODO: Add your availability zone redeployment and migration
181
+
<!--### Availability zone redeployment and migration-->
182
+
<!--TODO: Add your availability zone redeployment and migration-->
182
183
183
184
<!-- 3J. Availability zone redeployment and migration ----------------------------------------------------
184
185
Link to a document that provides step-by-step procedures, using Portal, ARM, CLI, for migrating existing resources to a zone redundant configuration. If such a document doesn't exist, please start the process of creating that document. The template for AZ migration is:
0 commit comments