|
| 1 | +--- |
| 2 | +title: How to manage connectivity and reliable messaging using Azure IoT Hub device SDKs |
| 3 | +description: Learn how to improve your device connectivity and messaging when using the Azure IoT Hub device SDKs |
| 4 | +services: iot-hub |
| 5 | +keywords: |
| 6 | +author: yzhong94 |
| 7 | +ms.author: yizhon |
| 8 | +ms.date: 07/07/2018 |
| 9 | +ms.topic: article |
| 10 | +ms.service: iot-hub |
| 11 | + |
| 12 | +documentationcenter: '' |
| 13 | +manager: timlt |
| 14 | +ms.devlang: na |
| 15 | +ms.custom: mvc |
| 16 | +--- |
| 17 | + |
| 18 | +# How to manage connectivity and reliable messaging using Azure IoT Hub device SDKs |
| 19 | + |
| 20 | +This guide provides high-level guidance for designing resilient device applications, by taking advantage of the connectivity and reliable messaging features in Azure IoT device SDKs. The goal of this article is to help answer questions and handle these scenarios : |
| 21 | + |
| 22 | +- managing a dropped network connection |
| 23 | +- managing switching between different network connections |
| 24 | +- managing reconnection due to service transient connection errors |
| 25 | + |
| 26 | +Implementation details may vary by language, see linked API documentation or specific SDK for more details. |
| 27 | + |
| 28 | +- [C/Python/iOS SDK](https://github.com/azure/azure-iot-sdk-c) |
| 29 | +- [.NET SDK](https://github.com/Azure/azure-iot-sdk-csharp/blob/master/iothub/device/devdoc/requirements/retrypolicy.md) |
| 30 | +- [Java SDK](https://github.com/Azure/azure-iot-sdk-java/blob/master/device/iot-device-client/devdoc/requirement_docs/com/microsoft/azure/iothub/retryPolicy.md) |
| 31 | +- [Node SDK](https://github.com/Azure/azure-iot-sdk-node/wiki/Connectivity-and-Retries#types-of-errors-and-how-to-detect-them) |
| 32 | + |
| 33 | + |
| 34 | +## Designing for resiliency |
| 35 | + |
| 36 | +IoT devices often rely on non-continuous and/or unstable network connections such as GSM or satellite. In addition, when interacting with cloud-based services, errors can occur due to temporary conditions such as intermittent service availability and infrastructure-level faults (commonly referred to as transient faults). An application running on a device need to manage the connection and reconnection mechanisms, as well as the retry logic for sending/receiving messages. Furthermore, the retry strategy requirements depend heavily on the IoT scenario the device participates in, and the device’s context and capabilities. |
| 37 | + |
| 38 | +The Azure IoT Hub device SDKs aim to simplify connecting and communicating from cloud-to-device and device-to-cloud by providing a robust and comprehensive way of connecting and sending/receiving messages to and from Azure IoT Hub. Developers can also modify existing implementation to develop the right retry strategy for a given scenario. |
| 39 | + |
| 40 | +The relevant SDK features that support connectivity and reliable messaging are covered in the following sections. |
| 41 | + |
| 42 | +## Connection and retry |
| 43 | + |
| 44 | +This section provides an overview of the reconnection and retry patterns available when managing connections, implementation guidance for using different retry policy in your device application, and relevant APIs for the device SDKs. |
| 45 | + |
| 46 | +### Error patterns |
| 47 | +Connection failures can happen in many levels: |
| 48 | + |
| 49 | +- Network errors such as a disconnected socket and name resolution errors |
| 50 | +- Protocol-level errors for HTTP, AMQP, and MQTT transport such as links detached or sessions expired |
| 51 | +- Application-level errors that result from either local mistakes such as invalid credentials or service behavior such as exceeding quota or throttling |
| 52 | + |
| 53 | +The device SDKs detect errors in all three levels. OS-related errors and hardware errors are not detected and handled by the device SDKs. The design is based on [The Transient Fault Handling Guidance](https://docs.microsoft.com/azure/architecture/best-practices/transient-faults#general-guidelines) from Azure Architecture Center. |
| 54 | + |
| 55 | +### Retry patterns |
| 56 | + |
| 57 | +The overall process for retry when connection errors are detected is: |
| 58 | +1. The SDK detects the error and the associated error in network, protocol, or application. |
| 59 | +2. Based on the error type, the SDK uses the error filter to decide if retry needs to be performed. If an **unrecoverable error** is identified by the SDK, operations (connection and send/receive) will be stopped and the SDK will notify the user. An unrecoverable error is an error that the SDK can identify and determine that it cannot be recovered, for example, an authentication or bad endpoint error. |
| 60 | +3. If a **recoverable error** is identified, the SDK begins to retry using the retry policy specified until a defined timeout expires. |
| 61 | +4. When the defined timeout expires, the SDK stops trying to connect or send, and notifies the user. |
| 62 | +5. The SDK allows the user to attach a callback to receive connection status changes. |
| 63 | + |
| 64 | +Three retry policies are provided: |
| 65 | +- **Exponential back-off with jitter**: This is the default retry policy applied. It tends to be aggressive at the start, slows down, and then hits a maximum delay that is not exceeded. The design is based on [Retry guidance from Azure Architecture Center](https://docs.microsoft.com/azure/architecture/best-practices/retry-service-specific). |
| 66 | +- **Custom retry**: You can implement a custom retry policy and inject it in the RetryPolicy depending on the language you choose. You can design a retry policy that is suited for your scenario. This is not available on the C SDK. |
| 67 | +- **No retry**: There is an option to set retry policy to "no retry," which disables the retry logic. The SDK tries to connect once and send a message once, assuming the connection is established. This policy would typically be used in cases where there are bandwidth or cost concerns. If this option is chosen, messages that fail to send are lost and cannot be recovered. |
| 68 | + |
| 69 | +### Retry policy APIs |
| 70 | + |
| 71 | + | SDK | SetRetryPolicy method | Policy implementations | Implementation guidance | |
| 72 | + |-----|----------------------|--|--| |
| 73 | + | C/Python/iOS | [IOTHUB_CLIENT_RESULT IoTHubClient_SetRetryPolicy](https://github.com/Azure/azure-iot-sdk-c/blob/2018-05-04/iothub_client/inc/iothub_client.h#L188) | **Default**: [IOTHUB_CLIENT_RETRY_EXPONENTIAL_BACKOFF](https://github.com/Azure/azure-iot-sdk-c/blob/master/doc/connection_and_messaging_reliability.md#connection-retry-policies)<BR>**Custom:** use available [retryPolicy](https://github.com/Azure/azure-iot-sdk-c/blob/master/doc/connection_and_messaging_reliability.md#connection-retry-policies)<BR>**No retry:** [IOTHUB_CLIENT_RETRY_NONE](https://github.com/Azure/azure-iot-sdk-c/blob/master/doc/connection_and_messaging_reliability.md#connection-retry-policies) | [C/Python/iOS implementation](https://github.com/Azure/azure-iot-sdk-c/blob/master/doc/connection_and_messaging_reliability.md#) | |
| 74 | + | Java| [SetRetryPolicy](https://docs.microsoft.com/en-us/java/api/com.microsoft.azure.sdk.iot.device._device_client_config.setretrypolicy?view=azure-java-stable) | **Default**: [ExponentialBackoffWithJitter class](https://github.com/Azure/azure-iot-sdk-java/blob/master/device/iot-device-client/src/main/java/com/microsoft/azure/sdk/iot/device/transport/NoRetry.java)<BR>**Custom:** implement [RetryPolicy interface](https://github.com/Azure/azure-iot-sdk-java/blob/master/device/iot-device-client/src/main/java/com/microsoft/azure/sdk/iot/device/transport/RetryPolicy.java)<BR>**No retry:** [NoRetry class](https://github.com/Azure/azure-iot-sdk-java/blob/master/device/iot-device-client/src/main/java/com/microsoft/azure/sdk/iot/device/transport/NoRetry.java) | [Java implementation](https://github.com/Azure/azure-iot-sdk-java/blob/master/device/iot-device-client/devdoc/requirement_docs/com/microsoft/azure/iothub/retryPolicy.md) |[.NET SDK](https://github.com/Azure/azure-iot-sdk-csharp/blob/master/iothub/device/devdoc/requirements/retrypolicy.md) |
| 75 | + | .NET| [DeviceClient.SetRetryPolicy](/dotnet/api/microsoft.azure.devices.client.deviceclient.setretrypolicy?view=azure-dotnet#Microsoft_Azure_Devices_Client_DeviceClient_SetRetryPolicy_Microsoft_Azure_Devices_Client_IRetryPolicy) | **Default**: [ExponentialBackoff class](/dotnet/api/microsoft.azure.devices.client.exponentialbackoff?view=azure-dotnet)<BR>**Custom:** implement [IRetryPolicy interface](https://docs.microsoft.com/dotnet/api/microsoft.azure.devices.client.iretrypolicy?view=azure-dotnet)<BR>**No retry:** [NoRetry class](/dotnet/api/microsoft.azure.devices.client.noretry?view=azure-dotnet) | [C# implementation]() | |
| 76 | + | Node| [setRetryPolicy](/javascript/api/azure-iot-device/client?view=azure-iot-typescript-latest#azure_iot_device_Client_setRetryPolicy) | **Default**: [ExponentialBackoffWithJitter class](/javascript/api/azure-iot-common/exponentialbackoffwithjitter?view=azure-iot-typescript-latest)<BR>**Custom:** implement [RetryPolicy interface](/javascript/api/azure-iot-common/retrypolicy?view=azure-iot-typescript-latest)<BR>**No retry:** [NoRetry class](/javascript/api/azure-iot-common/noretry?view=azure-iot-typescript-latest) | [Node implementation](https://github.com/Azure/azure-iot-sdk-node/wiki/Connectivity-and-Retries#types-of-errors-and-how-to-detect-them) | |
| 77 | + |
| 78 | + |
| 79 | +Below are code samples that illustrate this flow. |
| 80 | + |
| 81 | +#### .NET implementation guidance |
| 82 | + |
| 83 | +The code sample below shows how to define and set the default retry policy: |
| 84 | + |
| 85 | + ```csharp |
| 86 | + # define/set default retry policy |
| 87 | + RetryPolicy retryPolicy = new ExponentialBackoff(int.MaxValue, TimeSpan.FromMilliseconds(100), TimeSpan.FromSeconds(10), TimeSpan.FromMilliseconds(100)); |
| 88 | + SetRetryPolicy(retryPolicy); |
| 89 | + ``` |
| 90 | + |
| 91 | +To avoid high CPU usage, the retries are throttled if the code fails immediately (for example, when there is no network or route to destination) so that the minimum time to execute the next retry is 1 second. |
| 92 | + |
| 93 | +If the service is responding with a throttling error, the retry policy is different and cannot be changed via public API: |
| 94 | + |
| 95 | + ```csharp |
| 96 | + # throttled retry policy |
| 97 | + RetryPolicy retryPolicy = new ExponentialBackoff(RetryCount, TimeSpan.FromSeconds(10), TimeSpan.FromSeconds(60), TimeSpan.FromSeconds(5)); |
| 98 | + SetRetryPolicy(retryPolicy); |
| 99 | + ``` |
| 100 | + |
| 101 | +The retry mechanism will stop after `DefaultOperationTimeoutInMilliseconds`, which is currently set at 4 minutes. |
| 102 | + |
| 103 | +#### Other languages implementation guidance |
| 104 | +For other languages, review the implementation documentation below. Samples demonstrating the use of retry policy APIs are provided in the repository. |
| 105 | +- [C/Python/iOS SDK](https://github.com/azure/azure-iot-sdk-c) |
| 106 | +- [.NET SDK](https://github.com/Azure/azure-iot-sdk-csharp/blob/master/iothub/device/devdoc/requirements/retrypolicy.md) |
| 107 | +- [Java SDK](https://github.com/Azure/azure-iot-sdk-java/blob/master/device/iot-device-client/devdoc/requirement_docs/com/microsoft/azure/iothub/retryPolicy.md) |
| 108 | +- [Node SDK](https://github.com/Azure/azure-iot-sdk-node/wiki/Connectivity-and-Retries#types-of-errors-and-how-to-detect-them) |
| 109 | + |
0 commit comments