Skip to content

Commit cdb8ee9

Browse files
authored
Merge pull request #206747 from fcabrera23/eflow-diagnose-vm
Eflow diagnose vm
2 parents a1ac21b + 5f4de77 commit cdb8ee9

File tree

2 files changed

+228
-0
lines changed

2 files changed

+228
-0
lines changed

articles/iot-edge/TOC.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -229,6 +229,8 @@
229229
href: how-to-configure-iot-edge-for-linux-on-windows-iiot-dmz.md
230230
- name: Troubleshoot
231231
items:
232+
- name: Diagnose virtual machine
233+
href: troubleshoot-iot-edge-for-linux-on-windows.md
232234
- name: Resolve common errors
233235
href: troubleshoot-iot-edge-for-linux-on-windows-common-errors.md
234236
- name: Develop custom modules
Lines changed: 226 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,226 @@
1+
---
2+
title: Troubleshoot your IoT Edge for Linux on Windows device | Microsoft Docs
3+
description: Learn standard diagnostic skills for troubleshooting Azure IoT Edge for Linux on Windows (EFLOW) like retrieving component status and logs.
4+
author: PatAltimore
5+
ms.author: fcabrera
6+
ms.date: 08/03/2022
7+
ms.topic: conceptual
8+
ms.service: iot-edge
9+
services: iot-edge
10+
---
11+
12+
# Troubleshoot your IoT Edge for Linux on Windows device
13+
14+
[!INCLUDE [iot-edge-version-201806-or-202011](../../includes/iot-edge-version-201806-or-202011.md)]
15+
16+
If you experience issues running Azure IoT Edge for Linux on Windows (EFLOW) in your environment, use this article as a guide for troubleshooting and diagnostics.
17+
18+
You can also check [IoT Edge for Linux on Windows GitHub issues](https://github.com/Azure/iotedge-eflow/issues?q=is%3Aissue) for a similar reported issue.
19+
20+
## Isolate the issue
21+
22+
Your first step when troubleshooting IoT Edge for Linux on Windows should be to understand which component is causing the issue. There are three main components an EFLOW solution:
23+
- Windows components: PowerShell module, WSSDAgent & EFLOWProxy
24+
- CBL Mariner Linux virtual machine
25+
- Azure IoT Edge
26+
27+
For more information about EFLOW architecture, see [What is Azure IoT Edge for Linux on Windows](iot-edge-for-linux-on-windows.md).
28+
29+
If your issue is installing or deploying the EFLOW virtual machine, make sure that all the prerequisites are met, and verify your networking and VM configurations. If your installation and deployment was successful and you're facing issues with the post VM management, the problems are generally related to VM lifecycle, networking, or Azure IoT Edge. Finally, if the issue is related to modules or IoT Edge features, check [Troubleshoot your IoT Edge device](troubleshoot.md).
30+
31+
For more information about common errors related to *installation and deployment*, *provisioning*, *interaction with the VM*, and *networking*, see [Common issues and resolutions for Azure IoT Edge for Linux on Windows](troubleshoot-iot-edge-for-linux-on-windows-common-errors.md).
32+
33+
## Gather debug information
34+
35+
When you need to gather logs from an IoT Edge for Linux on Windows device, the most convenient way is to use the `Get-EflowLogs` PowerShell cmdlet. By default, this command collects the following logs:
36+
- **eflowlogs-summary.txt**: contains the status of all log collection steps.
37+
- **EFLOW VM configuration**: includes the VM, networking, and passthrough configurations and additional information.
38+
- **EFLOW Events** : Windows events related to the VM lifecycle and *EFLOWProxy* service.
39+
- **IoT Edge logs**: includes the output of `iotedge check` the IoT Edge runtime support bundle.
40+
- **WSSDAgent logs**: includes all the logs related to the *WSSDAgent* service.
41+
42+
After the cmdlet gathers all the required logs, the files are compressed into a single file named _eflowlogs.zip_ under the EFLOW installation path (For example, _C:\Program Files\Azure IoT Edge_).
43+
44+
## Check your IoT Edge version
45+
46+
If you're running an older version of IoT Edge for Linux on Windows, then upgrading may resolve your issue. To check the EFLOW version installed on your device, use the following steps:
47+
48+
1. Open **Settings** on Windows.
49+
1. Select **Add or Remove Programs**.
50+
1. Depending on the EFLOW release train being used (Continuous Release or LTS), choose **Azure IoT Edge LTS** or **Azure IoT Edge**.
51+
1. Check the version under the EFLOW app name.
52+
53+
For more information about specific versions release notes, check [Azure IoT Edge for Linux on Windows release notes](https://aka.ms/AzEFLOW-Releases).
54+
55+
For instructions on how to update your device, see [Update IoT Edge for Linux on Windows](iot-edge-for-linux-on-windows-updates.md).
56+
57+
## Check the EFLOW VM status
58+
59+
You can verify the EFLOW VM status and information by using the `Get-EflowVm` PowerShell cmdlet. If the EFLOW VM is running, the **VmPowerState** output should be _Running_. Whereas if the VM is stopped, the **VmPowerState** output is _Off_. To start or stop the EFLOW VM, use the `Start-EflowVm` and `Stop-EflowVm` cmdlet.
60+
61+
If the VM is _Running_ but you can't interact or access the VM, there's probably a networking issue between the VM and the Windows host OS. Also, make sure that the EFLOW VM has enough memory and storage available to continue with normal execution. Run the `Get-EflowVm` cmdlet to see the memory(_TotalMemMb_, _UsedMemMb_, _AvailableMemMb_) and storage(_TotalStorageMb_, _UsedStorageMb_, _AvailableStorageMb_) information.
62+
63+
Finally, if the VM is _Off_ and you can't start it using the `Start-EflowVm` cmdlet, there may be several reasons why the VM can't be started.
64+
65+
First, the issue could be related to the VM lifecycle management service (_WSSDAgent_) not running. Ensure that the _WSSDAgent_ service is running using the following steps:
66+
67+
1. Start an elevated _PowerShell_ session using **Run as Administrator**.
68+
1. Check the service status
69+
```powershell
70+
Get-Service -Name WSSDAgent
71+
```
72+
1. If the service is **Stopped**, start the service using the following command:
73+
```powershell
74+
Start-Service -Name WSSDAgent
75+
```
76+
1. If the service is **Running**, the issue is probably related to a networking misconfiguration or lack of resources to create the VM.
77+
78+
Second, the issue could be related to lack of resources. You can set the _EflowVmAssignedMemory_ (`-memoryInMb`) and _EflowVmAssignedCPUcores_ (`-cpuCount`) assigned to the VM during deployment using the `Deploy-Eflow` PowerShell cmdlet, or after deployment using the `Set-EflowVm` cmdlet. If these resources aren't available when trying to start the VM, the VM fails to start. To check the resources assigned and available, use the following steps:
79+
80+
1. Start an elevated _PowerShell_ session using **Run as Administrator**.
81+
1. Check the available memory. Ensure that the _FreePhysicalMemory_ is greater than the _EflowVmAssignedMemory_.
82+
```powershell
83+
Get-CIMInstance Win32_OperatingSystem | Select FreePhysicalMemory
84+
```
85+
1. Check the available CPU cores. Ensure that _NumberOfLogicalProcessors_ is greater than _EflowVmAssignedCPUcores_.
86+
```powershell
87+
wmic cpu get NumberOfLogicalProcessors
88+
```
89+
90+
Finally, the issue could be related to networking. For more information about EFLOW VM networking issues, see [How to troubleshoot Azure IoT Edge for Linux on Windows networking](./troubleshoot-common-errors.md).
91+
92+
## Check the status of the IoT Edge runtime
93+
94+
The [IoT Edge runtime](./iot-edge-runtime.md) is responsible for receiving the code to run at the edge and communicate the results. If IoT Edge runtime and modules aren't running, no code runs at the edge. You can check the runtime and module status using the following steps:
95+
96+
1. Start an elevated _PowerShell_ session using **Run as Administrator**.
97+
1. Check the IoT Edge runtime status. In particular, check if the service is **Loaded** and **Active**.
98+
```powershell
99+
(Get-EflowVm).EdgeRuntimeStatus.SystemCtlStatus | Format-List
100+
```
101+
1. Check the IoT Edge module status. Check that all modules are running.
102+
```powershell
103+
(Get-EflowVm).EdgeRuntimeStatus.ModuleList | Format-List
104+
```
105+
106+
For more information about IoT Edge runtime troubleshooting, see [Troubleshoot your IoT Edge device](./troubleshoot.md).
107+
108+
## Check TPM passthrough
109+
110+
If you're using TPM provisioning by following the guide [Create and provision an IoT Edge for Linux on Windows device at scale by using a TPM](./how-to-provision-devices-at-scale-linux-on-windows-tpm.md), you must enable TPM passthrough. In order to access the physical TPM connected to the Windows host OS, all the EFLOW VM TPM commands are forwarded to the host OS using a Windows service called _EFLOWProxy_. If you experience issues using _DpsTpm_ provisioning, or accessing TPM indexes from the EFLOW VM, check the service status using the following steps:
111+
112+
1. Start an elevated _PowerShell_ session using **Run as Administrator**.
113+
1. Check the status of the _EFLOWProxy_ service.
114+
```powershell
115+
Get-Service -Name EFLOWProxy
116+
```
117+
1. If the service is **Stopped**, start the service using the following command:
118+
```powershell
119+
Start-Service -Name EFLOWProxy
120+
```
121+
If the service won't start, check the _EFLOWProxy_ logs. Go to **Apps** > **Event Viewer** > **Applications and Services Logs** > **Microsoft** > **EFLOW** > **EFLOWProxy** and check the logs.
122+
123+
1. If the service is **Running** then check the EFLOW VM proxy services. Start by connecting to the EFLOW VM.
124+
```powershell
125+
Connect-EflowVm
126+
```
127+
1. From inside the EFLOW VM, check the TPM services are up and running.
128+
```bash
129+
sudo systemctl status tpm*
130+
```
131+
You should see the status and logs of four different services. The four services should be up and running.
132+
1. **tpm2-netns.service** - TPM2 Network Namespace
133+
1. **[email protected]** - TPM2 Sandbox Service on Port 2322
134+
1. **[email protected]** - TPM2 Sandbox Service on Port 2321
135+
1. **tpm2-abrmd.service** - TPM2 Access Broker and Resource Management Daemon
136+
137+
If any of these services is **stopped** or **failed**, restart all services using the following command:
138+
139+
```bash
140+
sudo systemctl restart tpm*
141+
```
142+
143+
1. Check the communication between the EFLOW VM and the *EFLOWProxy* service. If communication is working, you should see the _RegistrationId_ and the TPM _Endorsement Key_ as output from the following command:
144+
```bash
145+
sudo /usr/bin/tpm_device_provision
146+
```
147+
148+
## Check GPU Assignment
149+
150+
If you're using GPU passthrough, ensure to follow all the prerequisites and configurations outlined in [GPU acceleration for Azure IoT Edge for Linux on Windows](./gpu-acceleration.md). If you experience issues using GPU passthrough feature, check the following steps:
151+
152+
First, start by checking your device is available on the Windows host OS.
153+
154+
1. Open **Apps** > **Device Manager**.
155+
1. Go to **Display Adapters** and check that your GPU is in the list.
156+
1. Right-click the GPU name and select **Properties**.
157+
1. Check that the driver is correctly installed.
158+
159+
Second, if the GPU is correctly assigned, but still not being able to use it inside the EFLOW VM, use the following steps:
160+
1. Start an elevated _PowerShell_ session using **Run as Administrator**.
161+
1. Connect to the EFLOW VM
162+
```powershell
163+
Connect-EflowVm
164+
```
165+
1. If you're using a **NVIDIA GPU**, check the passthrough status using the following command:
166+
```bash
167+
sudo nvidia-smi
168+
```
169+
You should be able to see the GPU card information, driver version, CUDA version, and the GPU system and processes information.
170+
171+
1. If you're using an **Intel iGPU** passthrough, check the passthrough status using the following command:
172+
```bash
173+
sudo ls -al /dev/dxg
174+
```
175+
The expected output should be similar to:
176+
```Output
177+
crw-rw-rw- 1 root 10, 60 Sep 8 06:20 /dev/dxg
178+
```
179+
For more Intel iGPU performance and debugging information, see [Witness the power of Intel® iGPU with Azure IoT Edge for Linux on Windows(EFLOW) & OpenVINO™ Toolkit](https://community.intel.com/t5/Blogs/Tech-Innovation/Artificial-Intelligence-AI/Witness-the-power-of-Intel-iGPU-with-Azure-IoT-Edge-for-Linux-on/post/1382405).
180+
181+
## Check WSSDAgent logs for issues
182+
183+
The first step before checking *WSSDAgent* logs is to check if the VM was created and is running.
184+
185+
1. Start an elevated _PowerShell_ session using **Run as Administrator**.
186+
1. On Windows Client SKUs, check the [HCS](/virtualization/community/team-blog/2017/20170127-introducing-the-host-compute-service-hcs.md) virtual machines.
187+
```powershell
188+
hcsdiag list
189+
```
190+
If the EFLOW VM is running, you should see a line that contains a GUID followed by *wssdagent*. For example:
191+
192+
```Output
193+
2bd841e4-126a-11ed-9a91-f01dbca16d1e
194+
VM, Running, 2BD841E4-126A-11ED-9A91-F01DBCA16D1E, wssdagent
195+
196+
88d7aa8c-0d1f-4786-b4cb-62eff1decd92
197+
VM, SavedAsTemplate, 88D7AA8C-0D1F-4786-B4CB-62EFF1DECD92, CmService
198+
```
199+
200+
1. On Windows Server SKUs, check the [VMMS](/windows-server/virtualization/hyper-v/hyper-v-technology-overview.md) virtual machines
201+
```powershell
202+
hcsdiag list
203+
```
204+
If the EFLOW VM is running, you should see a line that contains the \<WindowsHostname-EFLOW> as a name. For example:
205+
```Output
206+
Name State CPUUsage(%) MemoryAssigned(M) Uptime Status Version
207+
---- ----- ----------- ----------------- ------ ------ -------
208+
NUC-EFLOW Running 0 1024 00:01:34.1280000 Operating normally 9.0
209+
```
210+
211+
If for some reason the VM isn't listed, that means that VM isn't running or the *WSSDAgent* wasn't able to create it. Use the following steps to check the *WSSDAgent* logs:
212+
213+
1. Open **File Explorer**.
214+
1. Go to `C:\ProgramData\wssdagent\log`
215+
1. Open the _wssdagent.log_ file.
216+
1. Look for the words **Error** or **Fail**.
217+
218+
## Reinstall EFLOW
219+
220+
Sometimes, a system might require significant special modification to work with existing networking or operating system constraints. For example, a system could require complex networking configurations (firewall, Windows policies, proxy settings) and custom Windows OS configurations. If you tried all previous troubleshooting steps and still have EFLOW issues, it's possible that there's some misconfiguration that is causing the issue. In this case, the final option is to uninstall and reinstall EFLOW.
221+
222+
## Next steps
223+
224+
Do you think that you found a bug in the IoT Edge for Linux on Windows? [Submit an issue](https://github.com/Azure/iotedge-eflow/issues) so that we can continue to improve.
225+
226+
If you have more questions, create a [Support request](https://portal.azure.com/#create/Microsoft.Support) for help.

0 commit comments

Comments
 (0)