Skip to content

Commit 7e4a56e

Browse files
committed
Reliability cont'd
1 parent 5d39ab8 commit 7e4a56e

File tree

4 files changed

+81
-78
lines changed

4 files changed

+81
-78
lines changed

articles/reliability/TOC.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -267,7 +267,7 @@
267267
- name: Azure Device Registry
268268
href: reliability-device-registry.md
269269
- name: Azure IoT Hub
270-
href: ../iot-hub/iot-hub-ha-dr.md?toc=/azure/reliability/toc.json&bc=/azure/reliability/breadcrumb/toc.json#disable-disaster-recovery
270+
href: reliability-iot-hub.md
271271
- name: Azure Notification Hubs
272272
href: reliability-notification-hubs.md
273273
- name: Media
Lines changed: 45 additions & 3 deletions
Loading

articles/reliability/overview-reliability-guidance.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -139,7 +139,7 @@ Each service guide generally contains information on how the service supports:
139139
| Product| Guidance |
140140
|----------|---------|
141141
|Azure Device Registry |[Reliability in Azure Device Registry](reliability-device-registry.md)|
142-
|Azure IoT Hub| [IoT Hub high availability and disaster recovery](../iot-hub/iot-hub-ha-dr.md?toc=/azure/reliability/toc.json&bc=/azure/reliability/breadcrumb/toc.json#disable-disaster-recovery) |
142+
|Azure IoT Hub| [Reliability in IoT Hub](reliability-iot-hub.md) |
143143
|Azure Notification Hubs| [Reliability in Azure Notification Hubs](reliability-notification-hubs.md)|
144144

145145

articles/reliability/reliability-iot-hub.md

Lines changed: 34 additions & 73 deletions
Original file line numberDiff line numberDiff line change
@@ -14,12 +14,6 @@ ms.date: 02/20/2025
1414

1515
# Reliability in Azure IoT Hub
1616

17-
<!-- 2. Introductory paragraph ---------------------------------------------------------
18-
Required: Provide an introduction.
19-
20-
Use the following as the introduction:
21-
-->
22-
2317
This article describes reliability support in Azure IoT Hub, covering intra-regional resiliency via [availability zones](#availability-zone-support) and [multi-region deployments](#multi-region-support).
2418

2519
Resiliency is a shared responsibility between you and Microsoft and so this article also covers ways for you to create a resilient solution that meets your needs.
@@ -37,13 +31,6 @@ Depending on the uptime goals you define for your IoT solutions, you should dete
3731
This section opens with an include that contains a brief explanation of production deployment recommendations such as SKUs and whether to enable zone redundancy in all production environments.
3832
-->
3933

40-
Here's a summary of the HA/DR options presented in this article that can be used as a frame of reference to choose the right option that works for your solution.
41-
42-
| HA/DR option | Recovery time | Requires manual intervention? | Implementation complexity | Cost impact|
43-
| --- | --- | --- | --- | --- |
44-
| Microsoft-initiated failover |2 - 26 hours|No|None|None|
45-
| Manual failover |10 min - 2 hoursYes|Very low. You only need to trigger this operation from the portal.|None|
46-
| Cross region HA |< 1 min|No|High|> 1x the cost of 1 IoT hub|
4734

4835
## Redundancy
4936

@@ -52,16 +39,7 @@ Here's a summary of the HA/DR options presented in this article that can be used
5239
This section is generally for database and storage services. Describe how your service achieves redundancy by default in the primary region. Describe how it protects against data loss in the case of a data center outage or power failure.
5340
-->
5441

55-
The IoT Hub service provides intra-region high availability (HA) by implementing redundancies in almost all layers of the service. The [SLA published by the IoT Hub service](https://azure.microsoft.com/support/legal/sla/iot-hub) is achieved by making use of these redundancies. No extra work is required by the developers of an IoT solution to take advantage of these HA features. Although IoT Hub offers a reasonably high uptime guarantee, transient failures can still be expected as with any distributed computing platform. If you're just getting started with migrating your solutions to the cloud from an on-premises solution, your focus needs to shift from optimizing "mean time between failures" to "mean time to recover". In other words, transient failures are to be considered normal while operating with the cloud in the mix. Appropriate [retry patterns](../iot/concepts-manage-device-reconnections.md#retry-patterns) must be built in to the components interacting with a cloud application to deal with transient failures.
56-
57-
**Example:**
58-
59-
By default, \[service-name\] achieves redundancy by spreading compute nodes and data throughout a single datacenter in the primary region. This approach protects your data in the event of a localized failure, such as a small-scale network or power failure, and even during the following events:
60-
61-
- Customer initiated management operations that result in a brief downtime.
62-
- Service maintenance operations.
63-
64-
*etc...*
42+
The IoT Hub service provides intra-region high availability (HA) by implementing redundancies in almost all layers of the service. The [SLA published by the IoT Hub service](https://azure.microsoft.com/support/legal/sla/iot-hub) is achieved by making use of these redundancies. No extra steps are required to take advantage of these HA features.
6543

6644
## Transient faults
6745

@@ -74,6 +52,8 @@ By default, \[service-name\] achieves redundancy by spreading compute nodes and
7452
If your service hosts the customer's code or applications, it might also be capable of causing or propagating transient faults. If you have guidance to help to avoid these situations, provide it here. For example, App Service supports deployment slots, which avoid application downtime during deployments.
7553
-->
7654

55+
Although IoT Hub offers a reasonably high uptime guarantee, transient failures can still be expected as with any distributed computing platform. If you're just getting started with migrating your solutions to the cloud from an on-premises solution, your focus needs to shift from optimizing "mean time between failures" to "mean time to recover". In other words, transient failures are to be considered normal while operating with the cloud in the mix. Appropriate [retry patterns](../iot/concepts-manage-device-reconnections.md#retry-patterns) must be built in to the components interacting with a cloud application to deal with transient failures.
56+
7757
## Availability zone support
7858

7959
[!INCLUDE [AZ support description](includes/reliability-availability-zone-description-include.md)]
@@ -92,25 +72,25 @@ Availability zone support for IoT Hub is enabled automatically for new IoT Hub r
9272

9373
| Region | Data resiliency | Smoother deployments |
9474
| ------ | --------------- | ------------ |
95-
| Australia East | :::image type="icon" source="./media/icons/yes-icon.png"::: | :::image type="icon" source="./media/icons/yes-icon.png"::: |
96-
| Brazil South | :::image type="icon" source="./media/icons/yes-icon.png"::: | :::image type="icon" source="./media/icons/yes-icon.png"::: |
97-
| Canada Central | :::image type="icon" source="./media/icons/yes-icon.png"::: | :::image type="icon" source="./media/icons/yes-icon.png"::: |
98-
| Central India | :::image type="icon" source="./media/icons/no-icon.png"::: | :::image type="icon" source="./media/icons/yes-icon.png"::: |
99-
| Central US | :::image type="icon" source="./media/icons/yes-icon.png"::: | :::image type="icon" source="./media/icons/yes-icon.png"::: |
100-
| East US | :::image type="icon" source="./media/icons/no-icon.png"::: | :::image type="icon" source="./media/icons/yes-icon.png"::: |
101-
| France Central | :::image type="icon" source="./media/icons/yes-icon.png"::: | :::image type="icon" source="./media/icons/yes-icon.png"::: |
102-
| Germany West Central | :::image type="icon" source="./media/icons/yes-icon.png"::: | :::image type="icon" source="./media/icons/yes-icon.png"::: |
103-
| Japan East | :::image type="icon" source="./media/icons/yes-icon.png"::: | :::image type="icon" source="./media/icons/yes-icon.png"::: |
104-
| Korea Central | :::image type="icon" source="./media/icons/no-icon.png"::: | :::image type="icon" source="./media/icons/yes-icon.png"::: |
105-
| North Europe | :::image type="icon" source="./media/icons/yes-icon.png"::: | :::image type="icon" source="./media/icons/yes-icon.png"::: |
106-
| Norway East | :::image type="icon" source="./media/icons/no-icon.png"::: | :::image type="icon" source="./media/icons/yes-icon.png"::: |
107-
| Qatar Central | :::image type="icon" source="./media/icons/no-icon.png"::: | :::image type="icon" source="./media/icons/yes-icon.png"::: |
108-
| Southcentral US | :::image type="icon" source="./media/icons/no-icon.png"::: | :::image type="icon" source="./media/icons/yes-icon.png"::: |
109-
| Southeast Asia | :::image type="icon" source="./media/icons/yes-icon.png"::: | :::image type="icon" source="./media/icons/yes-icon.png"::: |
110-
| UK South | :::image type="icon" source="./media/icons/yes-icon.png"::: | :::image type="icon" source="./media/icons/yes-icon.png"::: |
111-
| West Europe | :::image type="icon" source="./media/icons/no-icon.png"::: | :::image type="icon" source="./media/icons/yes-icon.png"::: |
112-
| West US 2 | :::image type="icon" source="./media/icons/yes-icon.png"::: | :::image type="icon" source="./media/icons/yes-icon.png"::: |
113-
| West US 3 | :::image type="icon" source="./media/icons/no-icon.png"::: | :::image type="icon" source="./media/icons/yes-icon.png"::: |
75+
| Australia East | :::image type="icon" source="./media/yes-icon.svg"::: | :::image type="icon" source="./media/yes-icon.svg"::: |
76+
| Brazil South | :::image type="icon" source="./media/yes-icon.svg"::: | :::image type="icon" source="./media/yes-icon.svg"::: |
77+
| Canada Central | :::image type="icon" source="./media/yes-icon.svg"::: | :::image type="icon" source="./media/yes-icon.svg"::: |
78+
| Central India | :::image type="icon" source="./media/icon-unsupported.svg"::: | :::image type="icon" source="./media/yes-icon.svg"::: |
79+
| Central US | :::image type="icon" source="./media/yes-icon.svg"::: | :::image type="icon" source="./media/yes-icon.svg"::: |
80+
| East US | :::image type="icon" source="./media/icon-unsupported.svg"::: | :::image type="icon" source="./media/yes-icon.svg"::: |
81+
| France Central | :::image type="icon" source="./media/yes-icon.svg"::: | :::image type="icon" source="./media/yes-icon.svg"::: |
82+
| Germany West Central | :::image type="icon" source="./media/yes-icon.svg"::: | :::image type="icon" source="./media/yes-icon.svg"::: |
83+
| Japan East | :::image type="icon" source="./media/yes-icon.svg"::: | :::image type="icon" source="./media/yes-icon.svg"::: |
84+
| Korea Central | :::image type="icon" source="./media/icon-unsupported.svg"::: | :::image type="icon" source="./media/yes-icon.svg"::: |
85+
| North Europe | :::image type="icon" source="./media/yes-icon.svg"::: | :::image type="icon" source="./media/yes-icon.svg"::: |
86+
| Norway East | :::image type="icon" source="./media/icon-unsupported.svg"::: | :::image type="icon" source="./media/yes-icon.svg"::: |
87+
| Qatar Central | :::image type="icon" source="./media/icon-unsupported.svg"::: | :::image type="icon" source="./media/yes-icon.svg"::: |
88+
| Southcentral US | :::image type="icon" source="./media/icon-unsupported.svg"::: | :::image type="icon" source="./media/yes-icon.svg"::: |
89+
| Southeast Asia | :::image type="icon" source="./media/yes-icon.svg"::: | :::image type="icon" source="./media/yes-icon.svg"::: |
90+
| UK South | :::image type="icon" source="./media/yes-icon.svg"::: | :::image type="icon" source="./media/yes-icon.svg"::: |
91+
| West Europe | :::image type="icon" source="./media/icon-unsupported.svg"::: | :::image type="icon" source="./media/yes-icon.svg"::: |
92+
| West US 2 | :::image type="icon" source="./media/yes-icon.svg"::: | :::image type="icon" source="./media/yes-icon.svg"::: |
93+
| West US 3 | :::image type="icon" source="./media/icon-unsupported.svg"::: | :::image type="icon" source="./media/yes-icon.svg"::: |
11494

11595
### Requirements
11696

@@ -131,9 +111,6 @@ This section should describe how data replication is performed during regular da
131111
132112
-->
133113

134-
>[!IMPORTANT]
135-
>The data replication approach across zones is usually different to the approach used across regions.
136-
137114
<!--
138115
Most Azure services replicate data across zones synchronously, which means that changes are applied to multiple (or all) zones simultaneously, and the change isn't considered to be completed until multiple/all zones have acknowledged the change. Use wording similar to the following to explain this approach and its tradeoffs.
139116
@@ -224,34 +201,24 @@ TODO: Add your failback
224201
-->
225202

226203
### Testing for zone failures
227-
TODO: Add your testing for zone failures
228-
229-
<!-- 6H. Testing for zone failures ----------------------------------------------------
230-
231-
For zonal services, can you trigger a fault in an availability zone, such as by using Azure Chaos Studio? If so, link to the specific fault types that simulate the appropriate failure.
232-
233-
-->
234-
235-
**Example:**
236-
237-
> You can simulate a zone failure by using Azure Chaos Studio. Inject the XXX fault to simulate the loss of an availability zone. Regularly test your responses to zone failures so that you can be ready for unexpected availability zone outages.
238-
239-
<!--
240-
For zone-redundant services, is there a way for the customer to test a zone failover? Usually that's not possible, so use wording like this:
241-
-->
242-
243-
**Example:**
244-
245-
> The Azure IoT Hub platform manages traffic routing, failover, and failback for zone-redundant X resources. You don't need to initiate anything. Because this feature is fully managed, you don't need to validate availability zone failure processes.
246204

205+
Azure IoT Hub manages traffic routing, failover, and failback for zone failures. You don't need to initiate anything. Because this feature is fully managed, you don't need to validate availability zone failure processes.
247206

248207
## Multi-region support
249208

250209
Azure IoT Hub uses [Azure region pairs](../reliability/regions-paired.md) to provide resiliency in the rare situation where a datacenter experiences extended outages. The recovery options available in such a situation are [Microsoft-initiated failover](#microsoft-initiated-failover) and [manual failover](#manual-failover) from the IoT hub's primary region to its geo-paired region. The fundamental difference between the two is that Microsoft initiates the former and the user initiates the latter. Also, manual failover provides a lower recovery time objective (RTO) compared to the Microsoft-initiated failover option.
251210

211+
Here's a summary of the HA/DR options presented in this article that can be used as a frame of reference to choose the right option that works for your solution.
212+
213+
| HA/DR option | Recovery time | Requires manual intervention? | Implementation complexity | Cost impact|
214+
| --- | --- | --- | --- | --- |
215+
| Microsoft-initiated failover |2 - 26 hours|No|None|None|
216+
| Manual failover |10 min - 2 hoursYes|Very low. You only need to trigger this operation from the portal.|None|
217+
| Cross region HA |< 1 min|No|High|> 1x the cost of 1 IoT hub|
218+
252219
### Region support
253220

254-
Failover is available in all regions that Azure IoT Hub supports. Only users deploying IoT hubs to the Brazil South and Southeast Asia (Singapore) regions can opt out of Microsoft-initiated failover. For more information, see [Disable disaster recovery](#disable-disaster-recovery).
221+
Failover is available in all regions that Azure IoT Hub supports. Only users deploying IoT hubs to the Brazil South and Southeast Asia (Singapore) regions can opt out of Microsoft-initiated failover. For more information, see [Disable disaster recovery](../iot-hub/iot-hub-ha-dr.md#disable-disaster-recovery).
255222

256223
>[!NOTE]
257224
>Azure IoT Hub doesn't store or process customer data outside of the geography where you deploy the service instance. For more information, see [Azure region pairs](../reliability/regions-paired.md).
@@ -282,7 +249,7 @@ Azure IoT Hub failover options offer the following recovery point objectives:
282249

283250
<sup>1</sup>Cloud-to-device messages and parent jobs aren't recovered as a part of manual failover.
284251

285-
#### Microosft-initiated failover
252+
#### Microsoft-initiated failover
286253

287254
Microsoft-initiated failover is exercised by Microsoft in rare situations to fail over all of the IoT hubs from an affected region to the corresponding geo-paired region. This process is a default option and requires no intervention from the user. Microsoft reserves the right to make a determination of when this option will be exercised. This mechanism doesn't involve a user consent before the user's hub is failed over. Microsoft-initiated failover has a recovery time objective (RTO) of 2-26 hours.
288255

@@ -296,7 +263,7 @@ The manual failover option is always available for use whether the primary regio
296263

297264
For step-by-step instructions, see [Tutorial: Perform manual failover for an IoT hub](tutorial-manual-failover.md)
298265

299-
### Configure multi-region support
266+
### Configure multi-region support
300267

301268
<!-- 7E. Configure multi-region support ----------------------
302269
@@ -307,12 +274,6 @@ For step-by-step instructions, see [Tutorial: Perform manual failover for an IoT
307274
Provide links to documents that show how to create a resource or instance with multi-region support. Ideally, the documents should contain examples using the Azure portal, Azure CLI, Azure PowerShell, and Bicep.
308275
-->
309276

310-
**Example:**
311-
312-
> To deploy a new multi-region IoT Hub resource, see [Create an IoT Hub resource with multi-region support].
313-
>
314-
> To enable multi-region support for an existing IoT Hub resource, see [Enable multi-region support in an IoT Hub resource].
315-
316277
<!--
317278
If your service does NOT support enabling multi-region support after deployment, add an explicit statement to indicate that.
318279

0 commit comments

Comments
 (0)