You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/reliability/overview-reliability-guidance.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -139,7 +139,7 @@ Each service guide generally contains information on how the service supports:
139
139
| Product| Guidance |
140
140
|----------|---------|
141
141
|Azure Device Registry |[Reliability in Azure Device Registry](reliability-device-registry.md)|
142
-
|Azure IoT Hub|[IoT Hub high availability and disaster recovery](../iot-hub/iot-hub-ha-dr.md?toc=/azure/reliability/toc.json&bc=/azure/reliability/breadcrumb/toc.json#disable-disaster-recovery)|
142
+
|Azure IoT Hub|[Reliability in IoT Hub](reliability-iot-hub.md)|
143
143
|Azure Notification Hubs|[Reliability in Azure Notification Hubs](reliability-notification-hubs.md)|
This article describes reliability support in Azure IoT Hub, covering intra-regional resiliency via [availability zones](#availability-zone-support) and [multi-region deployments](#multi-region-support).
24
18
25
19
Resiliency is a shared responsibility between you and Microsoft and so this article also covers ways for you to create a resilient solution that meets your needs.
@@ -37,13 +31,6 @@ Depending on the uptime goals you define for your IoT solutions, you should dete
37
31
This section opens with an include that contains a brief explanation of production deployment recommendations such as SKUs and whether to enable zone redundancy in all production environments.
38
32
-->
39
33
40
-
Here's a summary of the HA/DR options presented in this article that can be used as a frame of reference to choose the right option that works for your solution.
| Manual failover |10 min - 2 hoursYes|Very low. You only need to trigger this operation from the portal.|None|
46
-
| Cross region HA |< 1 min|No|High|> 1x the cost of 1 IoT hub|
47
34
48
35
## Redundancy
49
36
@@ -52,16 +39,7 @@ Here's a summary of the HA/DR options presented in this article that can be used
52
39
This section is generally for database and storage services. Describe how your service achieves redundancy by default in the primary region. Describe how it protects against data loss in the case of a data center outage or power failure.
53
40
-->
54
41
55
-
The IoT Hub service provides intra-region high availability (HA) by implementing redundancies in almost all layers of the service. The [SLA published by the IoT Hub service](https://azure.microsoft.com/support/legal/sla/iot-hub) is achieved by making use of these redundancies. No extra work is required by the developers of an IoT solution to take advantage of these HA features. Although IoT Hub offers a reasonably high uptime guarantee, transient failures can still be expected as with any distributed computing platform. If you're just getting started with migrating your solutions to the cloud from an on-premises solution, your focus needs to shift from optimizing "mean time between failures" to "mean time to recover". In other words, transient failures are to be considered normal while operating with the cloud in the mix. Appropriate [retry patterns](../iot/concepts-manage-device-reconnections.md#retry-patterns) must be built in to the components interacting with a cloud application to deal with transient failures.
56
-
57
-
**Example:**
58
-
59
-
By default, \[service-name\] achieves redundancy by spreading compute nodes and data throughout a single datacenter in the primary region. This approach protects your data in the event of a localized failure, such as a small-scale network or power failure, and even during the following events:
60
-
61
-
- Customer initiated management operations that result in a brief downtime.
62
-
- Service maintenance operations.
63
-
64
-
*etc...*
42
+
The IoT Hub service provides intra-region high availability (HA) by implementing redundancies in almost all layers of the service. The [SLA published by the IoT Hub service](https://azure.microsoft.com/support/legal/sla/iot-hub) is achieved by making use of these redundancies. No extra steps are required to take advantage of these HA features.
65
43
66
44
## Transient faults
67
45
@@ -74,6 +52,8 @@ By default, \[service-name\] achieves redundancy by spreading compute nodes and
74
52
If your service hosts the customer's code or applications, it might also be capable of causing or propagating transient faults. If you have guidance to help to avoid these situations, provide it here. For example, App Service supports deployment slots, which avoid application downtime during deployments.
75
53
-->
76
54
55
+
Although IoT Hub offers a reasonably high uptime guarantee, transient failures can still be expected as with any distributed computing platform. If you're just getting started with migrating your solutions to the cloud from an on-premises solution, your focus needs to shift from optimizing "mean time between failures" to "mean time to recover". In other words, transient failures are to be considered normal while operating with the cloud in the mix. Appropriate [retry patterns](../iot/concepts-manage-device-reconnections.md#retry-patterns) must be built in to the components interacting with a cloud application to deal with transient failures.
56
+
77
57
## Availability zone support
78
58
79
59
[!INCLUDE [AZ support description](includes/reliability-availability-zone-description-include.md)]
@@ -92,25 +72,25 @@ Availability zone support for IoT Hub is enabled automatically for new IoT Hub r
92
72
93
73
| Region | Data resiliency | Smoother deployments |
94
74
| ------ | --------------- | ------------ |
95
-
| Australia East | :::image type="icon" source="./media/icons/yes-icon.png"::: | :::image type="icon" source="./media/icons/yes-icon.png"::: |
96
-
| Brazil South | :::image type="icon" source="./media/icons/yes-icon.png"::: | :::image type="icon" source="./media/icons/yes-icon.png"::: |
| UK South | :::image type="icon" source="./media/yes-icon.svg"::: | :::image type="icon" source="./media/yes-icon.svg"::: |
91
+
| West Europe | :::image type="icon" source="./media/icon-unsupported.svg"::: | :::image type="icon" source="./media/yes-icon.svg"::: |
92
+
| West US 2 | :::image type="icon" source="./media/yes-icon.svg"::: | :::image type="icon" source="./media/yes-icon.svg"::: |
93
+
| West US 3 | :::image type="icon" source="./media/icon-unsupported.svg"::: | :::image type="icon" source="./media/yes-icon.svg"::: |
114
94
115
95
### Requirements
116
96
@@ -131,9 +111,6 @@ This section should describe how data replication is performed during regular da
131
111
132
112
-->
133
113
134
-
>[!IMPORTANT]
135
-
>The data replication approach across zones is usually different to the approach used across regions.
136
-
137
114
<!--
138
115
Most Azure services replicate data across zones synchronously, which means that changes are applied to multiple (or all) zones simultaneously, and the change isn't considered to be completed until multiple/all zones have acknowledged the change. Use wording similar to the following to explain this approach and its tradeoffs.
139
116
@@ -224,34 +201,24 @@ TODO: Add your failback
224
201
-->
225
202
226
203
### Testing for zone failures
227
-
TODO: Add your testing for zone failures
228
-
229
-
<!-- 6H. Testing for zone failures ----------------------------------------------------
230
-
231
-
For zonal services, can you trigger a fault in an availability zone, such as by using Azure Chaos Studio? If so, link to the specific fault types that simulate the appropriate failure.
232
-
233
-
-->
234
-
235
-
**Example:**
236
-
237
-
> You can simulate a zone failure by using Azure Chaos Studio. Inject the XXX fault to simulate the loss of an availability zone. Regularly test your responses to zone failures so that you can be ready for unexpected availability zone outages.
238
-
239
-
<!--
240
-
For zone-redundant services, is there a way for the customer to test a zone failover? Usually that's not possible, so use wording like this:
241
-
-->
242
-
243
-
**Example:**
244
-
245
-
> The Azure IoT Hub platform manages traffic routing, failover, and failback for zone-redundant X resources. You don't need to initiate anything. Because this feature is fully managed, you don't need to validate availability zone failure processes.
246
204
205
+
Azure IoT Hub manages traffic routing, failover, and failback for zone failures. You don't need to initiate anything. Because this feature is fully managed, you don't need to validate availability zone failure processes.
247
206
248
207
## Multi-region support
249
208
250
209
Azure IoT Hub uses [Azure region pairs](../reliability/regions-paired.md) to provide resiliency in the rare situation where a datacenter experiences extended outages. The recovery options available in such a situation are [Microsoft-initiated failover](#microsoft-initiated-failover) and [manual failover](#manual-failover) from the IoT hub's primary region to its geo-paired region. The fundamental difference between the two is that Microsoft initiates the former and the user initiates the latter. Also, manual failover provides a lower recovery time objective (RTO) compared to the Microsoft-initiated failover option.
251
210
211
+
Here's a summary of the HA/DR options presented in this article that can be used as a frame of reference to choose the right option that works for your solution.
| Manual failover |10 min - 2 hoursYes|Very low. You only need to trigger this operation from the portal.|None|
217
+
| Cross region HA |< 1 min|No|High|> 1x the cost of 1 IoT hub|
218
+
252
219
### Region support
253
220
254
-
Failover is available in all regions that Azure IoT Hub supports. Only users deploying IoT hubs to the Brazil South and Southeast Asia (Singapore) regions can opt out of Microsoft-initiated failover. For more information, see [Disable disaster recovery](#disable-disaster-recovery).
221
+
Failover is available in all regions that Azure IoT Hub supports. Only users deploying IoT hubs to the Brazil South and Southeast Asia (Singapore) regions can opt out of Microsoft-initiated failover. For more information, see [Disable disaster recovery](../iot-hub/iot-hub-ha-dr.md#disable-disaster-recovery).
255
222
256
223
>[!NOTE]
257
224
>Azure IoT Hub doesn't store or process customer data outside of the geography where you deploy the service instance. For more information, see [Azure region pairs](../reliability/regions-paired.md).
@@ -282,7 +249,7 @@ Azure IoT Hub failover options offer the following recovery point objectives:
282
249
283
250
<sup>1</sup>Cloud-to-device messages and parent jobs aren't recovered as a part of manual failover.
284
251
285
-
#### Microosft-initiated failover
252
+
#### Microsoft-initiated failover
286
253
287
254
Microsoft-initiated failover is exercised by Microsoft in rare situations to fail over all of the IoT hubs from an affected region to the corresponding geo-paired region. This process is a default option and requires no intervention from the user. Microsoft reserves the right to make a determination of when this option will be exercised. This mechanism doesn't involve a user consent before the user's hub is failed over. Microsoft-initiated failover has a recovery time objective (RTO) of 2-26 hours.
288
255
@@ -296,7 +263,7 @@ The manual failover option is always available for use whether the primary regio
296
263
297
264
For step-by-step instructions, see [Tutorial: Perform manual failover for an IoT hub](tutorial-manual-failover.md)
298
265
299
-
### Configure multi-region support
266
+
### Configure multi-region support
300
267
301
268
<!-- 7E. Configure multi-region support ----------------------
302
269
@@ -307,12 +274,6 @@ For step-by-step instructions, see [Tutorial: Perform manual failover for an IoT
307
274
Provide links to documents that show how to create a resource or instance with multi-region support. Ideally, the documents should contain examples using the Azure portal, Azure CLI, Azure PowerShell, and Bicep.
308
275
-->
309
276
310
-
**Example:**
311
-
312
-
> To deploy a new multi-region IoT Hub resource, see [Create an IoT Hub resource with multi-region support].
313
-
>
314
-
> To enable multi-region support for an existing IoT Hub resource, see [Enable multi-region support in an IoT Hub resource].
315
-
316
277
<!--
317
278
If your service does NOT support enabling multi-region support after deployment, add an explicit statement to indicate that.
0 commit comments