Skip to content

Commit b5ebe13

Browse files
authored
Merge pull request #112953 from kgremban/apr27-troubleshoot
Restructure troubleshooting article and add new content
2 parents e3e5833 + a8f66db commit b5ebe13

File tree

5 files changed

+410
-285
lines changed

5 files changed

+410
-285
lines changed

articles/iot-edge/TOC.yml

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -206,7 +206,11 @@
206206
- name: IoT Edge plugin for Jenkins
207207
href: how-to-devops-plugins.md
208208
- name: Troubleshoot
209-
href: troubleshoot.md
209+
items:
210+
- name: Standard diagnostic steps
211+
href: troubleshoot.md
212+
- name: Common error resolutions
213+
href: troubleshoot-common-errors.md
210214
- name: Resources
211215
items:
212216
- name: Support and help options

articles/iot-edge/offline-capabilities.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -103,7 +103,7 @@ One way to create this trust relationship is described in detail in the followin
103103

104104
## Specify DNS servers
105105

106-
To improve robustness, it is highly recommended you specify the DNS server addresses used in your environment. To set your DNS server for IoT Edge, see the resolution for [Edge Agent module continually reports 'empty config file' and no modules start on device](troubleshoot.md#edge-agent-module-continually-reports-empty-config-file-and-no-modules-start-on-the-device) in the troubleshooting article.
106+
To improve robustness, it is highly recommended you specify the DNS server addresses used in your environment. To set your DNS server for IoT Edge, see the resolution for [Edge Agent module continually reports 'empty config file' and no modules start on device](troubleshoot-common-errors.md#edge-agent-module-reports-empty-config-file-and-no-modules-start-on-the-device) in the troubleshooting article.
107107

108108
## Optional offline settings
109109

articles/iot-edge/production-checklist.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -100,7 +100,7 @@ The IoT Edge hub is optimized for performance by default, so it attempts to allo
100100

101101
When **OptimizeForPerformance** is set to **true**, the MQTT protocol head uses the PooledByteBufferAllocator, which has better performance but allocates more memory. The allocator does not work well on 32-bit operating systems or on devices with low memory. Additionally, when optimized for performance, RocksDb allocates more memory for its role as the local storage provider.
102102

103-
For more information, see [Stability issues on resource constrained devices](troubleshoot.md#stability-issues-on-resource-constrained-devices).
103+
For more information, see [Stability issues on smaller devices](troubleshoot-common-errors.md#stability-issues-on-smaller-devices).
104104

105105
#### Disable unused protocols
106106

@@ -193,7 +193,7 @@ Next, be sure to update the image references in the deployment.template.json fil
193193

194194
### Review outbound/inbound configuration
195195

196-
Communication channels between Azure IoT Hub and IoT Edge are always configured to be outbound. For most IoT Edge scenarios, only three connections are necessary. The container engine needs to connect with the container registry (or registries) that holds the module images. The IoT Edge runtime needs to connect with IoT Hub to retrieve device configuration information, and to send messages and telemetry. And if you use automatic provisioning, the IoT Edge daemon needs to connect to the Device Provisioning Service. For more information, see [Firewall and port configuration rules](troubleshoot.md#firewall-and-port-configuration-rules-for-iot-edge-deployment).
196+
Communication channels between Azure IoT Hub and IoT Edge are always configured to be outbound. For most IoT Edge scenarios, only three connections are necessary. The container engine needs to connect with the container registry (or registries) that holds the module images. The IoT Edge runtime needs to connect with IoT Hub to retrieve device configuration information, and to send messages and telemetry. And if you use automatic provisioning, the IoT Edge daemon needs to connect to the Device Provisioning Service. For more information, see [Firewall and port configuration rules](troubleshoot.md#check-your-firewall-and-port-configuration-rules).
197197

198198
### Allow connections from IoT Edge devices
199199

Lines changed: 332 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,332 @@
1+
---
2+
title: Common errors - Azure IoT Edge | Microsoft Docs
3+
description: Use this article to resolve common issues encountered when deploying an IoT Edge solution
4+
author: kgremban
5+
manager: philmea
6+
ms.author: kgremban
7+
ms.date: 04/27/2020
8+
ms.topic: conceptual
9+
ms.service: iot-edge
10+
services: iot-edge
11+
ms.custom: [amqp, mqtt]
12+
---
13+
14+
# Common issues and resolutions for Azure IoT Edge
15+
16+
Use this article to find steps to resolve common issues that you may experience when deploying IoT Edge solutions. If you need to learn how to find logs and errors from your IoT Edge device, see [Troubleshoot your IoT Edge device](troubleshoot.md).
17+
18+
## IoT Edge agent stops after about a minute
19+
20+
**Observed behavior:**
21+
22+
The edgeAgent module starts and runs successfully for about a minute, then stops. The logs indicate that the IoT Edge agent attempts to connect to IoT Hub over AMQP, and then attempts to connect using AMQP over WebSocket. When that fails, the IoT Edge agent exits.
23+
24+
Example edgeAgent logs:
25+
26+
```output
27+
2017-11-28 18:46:19 [INF] - Starting module management agent.
28+
2017-11-28 18:46:19 [INF] - Version - 1.0.7516610 (03c94f85d0833a861a43c669842f0817924911d5)
29+
2017-11-28 18:46:19 [INF] - Edge agent attempting to connect to IoT Hub via AMQP...
30+
2017-11-28 18:46:49 [INF] - Edge agent attempting to connect to IoT Hub via AMQP over WebSocket...
31+
```
32+
33+
**Root cause:**
34+
35+
A networking configuration on the host network is preventing the IoT Edge agent from reaching the network. The agent attempts to connect over AMQP (port 5671) first. If the connection fails, it tries WebSockets (port 443).
36+
37+
The IoT Edge runtime sets up a network for each of the modules to communicate on. On Linux, this network is a bridge network. On Windows, it uses NAT. This issue is more common on Windows devices using Windows containers that use the NAT network.
38+
39+
**Resolution:**
40+
41+
Ensure that there is a route to the internet for the IP addresses assigned to this bridge/NAT network. Sometimes a VPN configuration on the host overrides the IoT Edge network.
42+
43+
## IoT Edge agent can't access a module's image (403)
44+
45+
**Observed behavior:**
46+
47+
A container fails to run, and the edgeAgent logs show a 403 error.
48+
49+
**Root cause:**
50+
51+
The IoT Edge agent doesn't have permissions to access a module's image.
52+
53+
**Resolution:**
54+
55+
Make sure that your registry credentials are correctly specified in your deployment manifest.
56+
57+
## Edge Agent module reports 'empty config file' and no modules start on the device
58+
59+
**Observed behavior:**
60+
61+
The device has trouble starting modules defined in the deployment. Only the edgeAgent is running but continually reporting 'empty config file...'.
62+
63+
**Root cause:**
64+
65+
By default, IoT Edge starts modules in their own isolated container network. The device may be having trouble with DNS name resolution within this private network.
66+
67+
**Resolution:**
68+
69+
**Option 1: Set DNS server in container engine settings**
70+
71+
Specify the DNS server for your environment in the container engine settings, which will apply to all container modules started by the engine. Create a file named `daemon.json` specifying the DNS server to use. For example:
72+
73+
```json
74+
{
75+
"dns": ["1.1.1.1"]
76+
}
77+
```
78+
79+
The above example sets the DNS server to a publicly accessible DNS service. If the edge device can't access this IP from its environment, replace it with DNS server address that is accessible.
80+
81+
Place `daemon.json` in the right location for your platform:
82+
83+
| Platform | Location |
84+
| --------- | -------- |
85+
| Linux | `/etc/docker` |
86+
| Windows host with Windows containers | `C:\ProgramData\iotedge-moby\config` |
87+
88+
If the location already contains `daemon.json` file, add the **dns** key to it and save the file.
89+
90+
Restart the container engine for the updates to take effect.
91+
92+
| Platform | Command |
93+
| --------- | -------- |
94+
| Linux | `sudo systemctl restart docker` |
95+
| Windows (Admin PowerShell) | `Restart-Service iotedge-moby -Force` |
96+
97+
**Option 2: Set DNS server in IoT Edge deployment per module**
98+
99+
You can set DNS server for each module's *createOptions* in the IoT Edge deployment. For example:
100+
101+
```json
102+
"createOptions": {
103+
"HostConfig": {
104+
"Dns": [
105+
"x.x.x.x"
106+
]
107+
}
108+
}
109+
```
110+
111+
Be sure to set this configuration for the *edgeAgent* and *edgeHub* modules as well.
112+
113+
## IoT Edge hub fails to start
114+
115+
**Observed behavior:**
116+
117+
The edgeHub module fails to start. You may see a message like one of the following errors in the logs:
118+
119+
```output
120+
One or more errors occurred.
121+
(Docker API responded with status code=InternalServerError, response=
122+
{\"message\":\"driver failed programming external connectivity on endpoint edgeHub (6a82e5e994bab5187939049684fb64efe07606d2bb8a4cc5655b2a9bad5f8c80):
123+
Error starting userland proxy: Bind for 0.0.0.0:443 failed: port is already allocated\"}\n)
124+
```
125+
126+
Or
127+
128+
```output
129+
info: edgelet_docker::runtime -- Starting module edgeHub...
130+
warn: edgelet_utils::logging -- Could not start module edgeHub
131+
warn: edgelet_utils::logging -- caused by: failed to create endpoint edgeHub on network nat: hnsCall failed in Win32:
132+
The process cannot access the file because it is being used by another process. (0x20)
133+
```
134+
135+
**Root cause:**
136+
137+
Some other process on the host machine has bound a port that the edgeHub module is trying to bind. The IoT Edge hub maps ports 443, 5671, and 8883 for use in gateway scenarios. The module fails to start if another process has already bound one of those ports.
138+
139+
**Resolution:**
140+
141+
You can resolve this issue two ways:
142+
143+
If the IoT Edge device is functioning as a gateway device, then you need to find and stop the process that is using port 443, 5671, or 8883. An error for port 443 usually means that the other process is a web server.
144+
145+
If you don't need to use the IoT Edge device as a gateway, then you can remove the port bindings from edgeHub's module create options. You can change the create options in the Azure portal or directly in the deployment.json file.
146+
147+
In the Azure portal:
148+
149+
1. Navigate to your IoT hub and select **IoT Edge**.
150+
151+
2. Select the IoT Edge device that you want to update.
152+
153+
3. Select **Set Modules**.
154+
155+
4. Select **Runtime Settings**.
156+
157+
5. In the **Edge Hub** module settings, delete everything from the **Create Options** text box.
158+
159+
6. Save your changes and create the deployment.
160+
161+
In the deployment.json file:
162+
163+
1. Open the deployment.json file that you applied to your IoT Edge device.
164+
165+
2. Find the `edgeHub` settings in the edgeAgent desired properties section:
166+
167+
```json
168+
"edgeHub": {
169+
"settings": {
170+
"image": "mcr.microsoft.com/azureiotedge-hub:1.0",
171+
"createOptions": "{\"HostConfig\":{\"PortBindings\":{\"8883/tcp\":[{\"HostPort\":\"8883\"}],\"443/tcp\":[{\"HostPort\":\"443\"}]}}}"
172+
},
173+
"type": "docker",
174+
"status": "running",
175+
"restartPolicy": "always"
176+
}
177+
```
178+
179+
3. Remove the `createOptions` line, and the trailing comma at the end of the `image` line before it:
180+
181+
```json
182+
"edgeHub": {
183+
"settings": {
184+
"image": "mcr.microsoft.com/azureiotedge-hub:1.0"
185+
},
186+
"type": "docker",
187+
"status": "running",
188+
"restartPolicy": "always"
189+
}
190+
```
191+
192+
4. Save the file and apply it to your IoT Edge device again.
193+
194+
## IoT Edge security daemon fails with an invalid hostname
195+
196+
**Observed behavior:**
197+
198+
Attempting to [check the IoT Edge security manager logs](troubleshoot.md#check-the-status-of-the-iot-edge-security-manager-and-its-logs) fails and prints the following message:
199+
200+
```output
201+
Error parsing user input data: invalid hostname. Hostname cannot be empty or greater than 64 characters
202+
```
203+
204+
**Root cause:**
205+
206+
The IoT Edge runtime can only support hostnames that are shorter than 64 characters. Physical machines usually don't have long hostnames, but the issue is more common on a virtual machine. The automatically generated hostnames for Windows virtual machines hosted in Azure, in particular, tend to be long.
207+
208+
**Resolution:**
209+
210+
When you see this error, you can resolve it by configuring the DNS name of your virtual machine, and then setting the DNS name as the hostname in the setup command.
211+
212+
1. In the Azure portal, navigate to the overview page of your virtual machine.
213+
2. Select **configure** under DNS name. If your virtual machine already has a DNS name configured, you don't need to configure a new one.
214+
215+
![Configure DNS name of virtual machine](./media/troubleshoot/configure-dns.png)
216+
217+
3. Provide a value for **DNS name label** and select **Save**.
218+
4. Copy the new DNS name, which should be in the format **\<DNSnamelabel\>.\<vmlocation\>.cloudapp.azure.com**.
219+
5. Inside the virtual machine, use the following command to set up the IoT Edge runtime with your DNS name:
220+
221+
* On Linux:
222+
223+
```bash
224+
sudo nano /etc/iotedge/config.yaml
225+
```
226+
227+
* On Windows:
228+
229+
```cmd
230+
notepad C:\ProgramData\iotedge\config.yaml
231+
```
232+
233+
## Can't get the IoT Edge daemon logs on Windows
234+
235+
**Observed behavior:**
236+
237+
You get an EventLogException when using `Get-WinEvent` on Windows.
238+
239+
**Root cause:**
240+
241+
The `Get-WinEvent` PowerShell command relies on a registry entry to be present to find logs by a specific `ProviderName`.
242+
243+
**Resolution:**
244+
245+
Set a registry entry for the IoT Edge daemon. Create a **iotedge.reg** file with the following content, and import in to the Windows Registry by double-clicking it or using the `reg import iotedge.reg` command:
246+
247+
```reg
248+
Windows Registry Editor Version 5.00
249+
250+
[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\EventLog\Application\iotedged]
251+
"CustomSource"=dword:00000001
252+
"EventMessageFile"="C:\\ProgramData\\iotedge\\iotedged.exe"
253+
"TypesSupported"=dword:00000007
254+
```
255+
256+
## Stability issues on smaller devices
257+
258+
**Observed behavior:**
259+
260+
You may experience stability problems on resource constrained devices like the Raspberry Pi, especially when used as a gateway. Symptoms include out of memory exceptions in the IoT Edge hub module, downstream devices failing to connect, or the device failing to send telemetry messages after a few hours.
261+
262+
**Root cause:**
263+
264+
The IoT Edge hub, which is part of the IoT Edge runtime, is optimized for performance by default and attempts to allocate large chunks of memory. This optimization is not ideal for constrained edge devices and can cause stability problems.
265+
266+
**Resolution:**
267+
268+
For the IoT Edge hub, set an environment variable **OptimizeForPerformance** to **false**. There are two ways to set environment variables:
269+
270+
In the Azure portal:
271+
272+
In your IoT Hub, select your IoT Edge device and from the device details page and select **Set Modules** > **Runtime Settings**. Create an environment variable for the IoT Edge hub module called *OptimizeForPerformance* that is set to *false*.
273+
274+
![OptimizeForPerformance set to false](./media/troubleshoot/optimizeforperformance-false.png)
275+
276+
In the deployment manifest:
277+
278+
```json
279+
"edgeHub": {
280+
"type": "docker",
281+
"settings": {
282+
"image": "mcr.microsoft.com/azureiotedge-hub:1.0",
283+
"createOptions": <snipped>
284+
},
285+
"env": {
286+
"OptimizeForPerformance": {
287+
"value": "false"
288+
}
289+
},
290+
```
291+
292+
## IoT Edge module fails to send a message to edgeHub with 404 error
293+
294+
**Observed behavior:**
295+
296+
A custom IoT Edge module fails to send a message to the IoT Edge hub with a 404 `Module not found` error. The IoT Edge daemon prints the following message to the logs:
297+
298+
```output
299+
Error: Time:Thu Jun 4 19:44:58 2018 File:/usr/sdk/src/c/provisioning_client/adapters/hsm_client_http_edge.c Func:on_edge_hsm_http_recv Line:364 executing HTTP request fails, status=404, response_buffer={"message":"Module not found"}u, 04 )
300+
```
301+
302+
**Root cause:**
303+
304+
The IoT Edge daemon enforces process identification for all modules connecting to the edgeHub for security reasons. It verifies that all messages being sent by a module come from the main process ID of the module. If a message is being sent by a module from a different process ID than initially established, it will reject the message with a 404 error message.
305+
306+
**Resolution:**
307+
308+
As of version 1.0.7, all module processes are authorized to connect. For more information, see the [1.0.7 release changelog](https://github.com/Azure/iotedge/blob/master/CHANGELOG.md#iotedged-1).
309+
310+
If upgrading to 1.0.7 isn't possible, complete the following steps. Make sure that the same process ID is always used by the custom IoT Edge module to send messages to the edgeHub. For instance, make sure to `ENTRYPOINT` instead of `CMD` command in your Docker file. The `CMD` command leads to one process ID for the module and another process ID for the bash command running the main program, but `ENTRYPOINT` leads to a single process ID.
311+
312+
## IoT Edge module deploys successfully then disappears from device
313+
314+
**Observed behavior:**
315+
316+
After setting modules for an IoT Edge device, the modules are deployed successfully but after a few minutes they disappear from the device and from the device details in the Azure portal. Other modules than the ones defined might also appear on the device.
317+
318+
**Root cause:**
319+
320+
If an automatic deployment targets a device, it takes priority over manually setting the modules for a single device. The **Set modules** functionality in Azure portal or **Create deployment for single device** functionality in Visual Studio Code will take effect for a moment. You see the modules that you defined start on the device. Then the automatic deployment's priority kicks in and overwrites the device's desired properties.
321+
322+
**Resolution:**
323+
324+
Only use one type of deployment mechanism per device, either an automatic deployment or individual device deployments. If you have multiple automatic deployments targeting a device, you can change priority or target descriptions to make sure the correct one applies to a given device. You can also update the device twin to no longer match the target description of the automatic deployment.
325+
326+
For more information, see [Understand IoT Edge automatic deployments for single devices or at scale](module-deployment-monitoring.md).
327+
328+
## Next steps
329+
330+
Do you think that you found a bug in the IoT Edge platform? [Submit an issue](https://github.com/Azure/iotedge/issues) so that we can continue to improve.
331+
332+
If you have more questions, create a [Support request](https://portal.azure.com/#create/Microsoft.Support) for help.

0 commit comments

Comments
 (0)