|
| 1 | +--- |
| 2 | +title: Resiliency in Microsoft Energy Data Services #Required; Must be "Resiliency in *your official service name*" |
| 3 | +description: Find out about reliability in Microsoft Energy Data Services #Required; |
| 4 | +author: bharathim #Required; your GitHub user alias, with correct capitalization. |
| 5 | +ms.author: bselvaraj #Required; Microsoft alias of author; optional team alias. |
| 6 | +ms.topic: overview |
| 7 | +ms.custom: subject-reliability |
| 8 | +ms.prod: non-product-specific |
| 9 | +ms.date: 12/05/2022 #Required; mm/dd/yyyy format. |
| 10 | +--- |
| 11 | + |
| 12 | +<!--#Customer intent: As a customer, I want to understand reliability support for Microsoft Energy Data Services so that I can respond to and/or avoid failures in order to minimize downtime and data loss. --> |
| 13 | + |
| 14 | +<!-- |
| 15 | +
|
| 16 | +Template for the main reliability article for Azure services. |
| 17 | +Keep the required sections and add/modify any content for any information specific to your service. |
| 18 | +This article should live in the reliability content area of azure-docs-pr. |
| 19 | +This article should be linked to in your TOC. Under a Reliability node or similar. The name of this page should be *reliability-Microsoft Energy Data Services.md* and the TOC title should be "Reliability in Microsoft Energy Data Services". |
| 20 | +Keep the headings in this order. |
| 21 | +
|
| 22 | +This template uses comment pseudo code to indicate where you must choose between two options or more. |
| 23 | +
|
| 24 | +Conditions are used in this document in the following manner and can be easily searched for: |
| 25 | +--> |
| 26 | + |
| 27 | +<!-- IF (AZ SUPPORTED) --> |
| 28 | +<!-- some text --> |
| 29 | +<!-- END IF (AZ SUPPORTED)--> |
| 30 | + |
| 31 | +<!-- BEGIN IF (SLA INCREASE) --> |
| 32 | +<!-- some text --> |
| 33 | +<!-- END IF (SLA INCREASE) --> |
| 34 | + |
| 35 | +<!-- IF (SERVICE IS ZONAL) --> |
| 36 | +<!-- some text --> |
| 37 | +<!-- END IF (SERVICE IS ZONAL) --> |
| 38 | + |
| 39 | +<!-- IF (SERVICE IS ZONE REDUNDANT) --> |
| 40 | +<!-- some text --> |
| 41 | +<!-- END IF (SERVICE IS ZONAL) --> |
| 42 | + |
| 43 | +<!-- |
| 44 | +
|
| 45 | +IMPORTANT: |
| 46 | +- Do a search and replace of TODO-service-name with the name of your service. That will make the template easier to read. |
| 47 | +- ALL sections are required unless noted otherwise. |
| 48 | +- MAKE SURE YOU REMOVE ALL COMMENTS BEFORE PUBLISH!!!!!!!! |
| 49 | +
|
| 50 | +--> |
| 51 | + |
| 52 | +<!-- 1. H1 ----------------------------------------------------------------------------- |
| 53 | +Required: Uses the format "What is reliability in X?" |
| 54 | +The "X" part should identify the product or service. |
| 55 | +--> |
| 56 | + |
| 57 | +# What is reliability in Microsoft Energy Data Services? |
| 58 | + |
| 59 | +<!-- 2. Introductory paragraph --------------------------------------------------------- |
| 60 | +Required: Provide an introduction. Use the following placeholder as a suggestion, but elaborate. |
| 61 | +--> |
| 62 | + |
| 63 | +This article describes reliability support in Microsoft Energy Data Services, and covers regional resiliency with [availability zones](../reliability/reliability-functions?toc=%2Fazure%2Fazure-functions%2FTOC.json&tabs=azure-portal#availability-zone-support). For a more detailed overview of reliability in Azure, see [Azure reliability](https://docs.microsoft.com/azure/architecture/framework/resiliency/overview.md). |
| 64 | + |
| 65 | +[Introduction] |
| 66 | +TODO: Add your introduction |
| 67 | + |
| 68 | +## Availability zone support |
| 69 | +<!-- IF (AZ SUPPORTED) --> |
| 70 | +Azure availability zones are at least three physically separate groups of datacenters within each Azure region. Datacenters within each zone are equipped with independent power, cooling, and networking infrastructure. In the case of a local zone failure, availability zones are designed so that if the one zone is affected, regional services, capacity, and high availability are supported by the remaining two zones. Failures can range from software and hardware failures to events such as earthquakes, floods, and fires. Tolerance to failures is achieved with redundancy and logical isolation of Azure services. For more detailed information on availability zones in Azure, see [Regions and availability zones](/azure/availability-zones/az-overview.md). |
| 71 | + |
| 72 | +Azure availability zones-enabled services are designed to provide the right level of reliability and flexibility. They can be configured in two ways. They can be either zone redundant, with automatic replication across zones, or zonal, with instances pinned to a specific zone. You can also combine these approaches. For more information on zonal vs. zone-redundant architecture, see [Build solutions with availability zones](/azure/architecture/high-availability/building-solutions-for-high-availability). |
| 73 | + |
| 74 | +Microsoft Energy Data Services Preview supports zone-redundant instance by default and there is no setup required by the Customer. |
| 75 | + |
| 76 | +### Prerequisites |
| 77 | + |
| 78 | +The Microsoft Energy Data Services Preview supports availability zones in the following regions: |
| 79 | + |
| 80 | +| Americas | Europe | Middle East | Africa | Asia Pacific | |
| 81 | +|------------------|----------------------|---------------|--------------------|----------------| |
| 82 | +| South Central US | North Europe | | | | |
| 83 | +| East US | West Europe | | | | |
| 84 | + |
| 85 | +### Zonal failover support |
| 86 | +N/A |
| 87 | + |
| 88 | +<!-- 3D. Zonal failover support -------------------------------------------------------- |
| 89 | +--> |
| 90 | + |
| 91 | +<!-- IF (SERVICE IS ZONAL) --> |
| 92 | + |
| 93 | +<!-- Indicate here whether the customer can set up resources of the service to failover to another zone. If they can set up failover resources, provide a link to documentation for this procedure. If such documentation doesn't exist, create the document, and then link to it from here. --> |
| 94 | + |
| 95 | +<!-- END IF (SERVICE IS ZONAL) --> |
| 96 | + |
| 97 | +### Fault tolerance |
| 98 | +N/A |
| 99 | + |
| 100 | +<!-- 3E. Fault tolerance --------------------------------------------------------------- |
| 101 | +To prepare for availability zone failure, customers should over-provision capacity of service to ensure that the solution can tolerate ⅓ loss of capacity and continue to function without degraded performance during zone-wide outages. Provide any information as to how customers should achieve this. |
| 102 | +--> |
| 103 | + |
| 104 | +### Zone down experience |
| 105 | +In a zone-wide outage scenario, users should experience no impact on provisioned resources in a zone-redundant deployment. During a zone-wide outage , customers should be prepared to experience brief interruption for communication to provisioned resources; typically, this is manifested by client receiving 409 error code; this prompts re-try logic with appropriate intervals. New requests will be directed to healthy nodes with zero impact on user. During zone-wide outages, users will be able to create new offering resources and successfully scale existing ones. |
| 106 | + |
| 107 | +<!-- IF (SERVICE IS ZONE REDUNDANT) --> |
| 108 | + |
| 109 | +<!-- 3F. Zone down experience ---------------------------------------------------------- |
| 110 | +Select the scenario that best describes customer experience or combine/provide your own description: |
| 111 | +
|
| 112 | +- During a zone-wide outage, no action is required during zone recovery, Offering will self-heal and re-balance itself to take advantage of the healthy zone automatically. |
| 113 | + |
| 114 | +- During a zone-wide outage, the customer should expect brief degradation of performance, until the service self-healing re-balances underlying capacity to adjust to healthy zones. This is not dependent on zone restoration; it is expected that the Microsoft-managed service self-healing state will compensate for a lost zone, leveraging capacity from other zones. |
| 115 | + |
| 116 | +- In a zone-wide outage scenario, users should experience no impact on provisioned resources in a zone-redundant deployment. During a zone-wide outage , customers should be prepared to experience brief interruption for communication to provisioned resources; typically, this is manifested by client receiving 409 error code; this prompts re-try logic with appropriate intervals. New requests will be directed to healthy nodes with zero impact on user. During zone-wide outages, users will be able to create new offering resources and successfully scale existing ones. |
| 117 | + |
| 118 | +The table may contain: |
| 119 | +
|
| 120 | +- CRUD and Scale-out operations (Create Read Update Delete) |
| 121 | +- Application communication scenarios – data plane operations (for example, insert/update/delete for a database). |
| 122 | +
|
| 123 | +| Operation name | Outage | Availability Impact | Durability Impact | Error code |What to do | |
| 124 | +|--|--|--|--|--| |
| 125 | +
|
| 126 | +The table below lists all error codes that may be thrown by the Microsoft Energy Data Services and resources of that service during zone down outages. |
| 127 | +
|
| 128 | +List the following: |
| 129 | +
|
| 130 | +- CRUD and Scale-out operations (Create Read Update Delete) |
| 131 | +- Application communication scenarios – data plane operations (for example, insert/update/delete for a database). |
| 132 | +
|
| 133 | +| Error code | Operation | Description | |
| 134 | +|---|---|---| |
| 135 | +--> |
| 136 | +<!-- END IF (SERVICE IS ZONE REDUNDANT) --> |
| 137 | + |
| 138 | +#### Zone outage preparation and recovery |
| 139 | +TODO: Add your zone outage preparation and recovery |
| 140 | + |
| 141 | +<!-- 3G. Zone outage preparation and recovery ------------------------------------------ |
| 142 | +The table below lists alerts that can trigger an action to compensate for a loss of capacity or a state for your resources. It also provides information regarding actions for recovery, as well as how to prepare for such alerts prior to the outage. |
| 143 | +
|
| 144 | +| Alert type | Actions for recovery | How to prepare prior to outage | |
| 145 | +|--|--|--| |
| 146 | +--> |
| 147 | + |
| 148 | +### Low-latency design |
| 149 | +TODO: Add your low-latency design |
| 150 | + |
| 151 | +<!-- 3H. Low-latency design ------------------------------------------------------------ |
| 152 | +--> |
| 153 | + |
| 154 | +<!-- IF (SERVICE IS ZONE REDUNDANT AND ZONAL) --> |
| 155 | + |
| 156 | +<!-- Describe scenarios in which the customer will opt for zonal vs. zone-redundant version of your offering.--> |
| 157 | + |
| 158 | +<!-- Microsoft guarantees communication between zones of < 2ms. In scenarios in which your solution is sensitive to such spikes, you should configure all components of the solution to align to a zone. This section is intended to explain how your service enables low-latency design, including which SKUs of the service support it. --> |
| 159 | + |
| 160 | +<!-- OPTIONAL SECTION. If your service supports active-passive model, share an approach to control active component to a desired zone and align passive component with next zone. Make an explicit call-out for functionality where a resource is flagged as zone redundant but offers active-passive/primary-replica model of functionality--> |
| 161 | + |
| 162 | +<!-- END IF (SERVICE IS ZONE REDUNDANT AND ZONAL) --> |
| 163 | + |
| 164 | +>[!IMPORTANT] |
| 165 | +>By opting out of zone-aware deployment, you forego protection from isolation of underlying faults. Use of SKUs that don't support availability zones or opting out from availability zone configuration forces reliance on resources that don't obey zone placement and separation (including underlying dependencies of these resources). These resources shouldn't be expected to survive zone-down scenarios. Solutions that leverage such resources should define a disaster recovery strategy and configure a recovery of the solution in another region. |
| 166 | +
|
| 167 | +### Safe deployment techniques |
| 168 | +TODO: Add your safe deployment techniques |
| 169 | + |
| 170 | +<!-- 3I. Safe deployment techniques ---------------------------------------------------- |
| 171 | +If application safe deployment is not relevant for this resource type, explain why and how the service manages availability zones for the customer behind the scenes. |
| 172 | +--> |
| 173 | + |
| 174 | +When you opt for availability zones isolation, you should utilize safe deployment techniques for application code, as well as application upgrades. Describe techniques that the customer should use to target one-zone-at-a-time for deployment and upgrades (for example, virtual machine scale sets). If something is strictly recommended, call it out below. |
| 175 | + |
| 176 | +<!-- List health signals that the customer should monitor, before proceeding with upgrading next set of nodes in another zone, to contain a potential impact of an unhealthy deployment. --> |
| 177 | +[Health signals] |
| 178 | +TODO: Add your health signals |
| 179 | + |
| 180 | +### Availability zone redeployment and migration |
| 181 | +TODO: Add your availability zone redeployment and migration |
| 182 | + |
| 183 | +<!-- 3J. Availability zone redeployment and migration ---------------------------------------------------- |
| 184 | +Link to a document that provides step-by-step procedures, using Portal, ARM, CLI, for migrating existing resources to a zone redundant configuration. If such a document doesn't exist, please start the process of creating that document. The template for AZ migration is: |
| 185 | +
|
| 186 | +` [!INCLUDE [AZ migration template](az-migration-template.md)] ` |
| 187 | +--> |
| 188 | +<!-- END IF (AZ SUPPORTED)--> |
| 189 | + |
| 190 | +## Next steps |
| 191 | +> [!div class="nextstepaction"] |
| 192 | +> [Resiliency in Azure](/azure/availability-zones/overview.md) |
0 commit comments