Skip to content

Commit 10a39a9

Browse files
Merge pull request #296588 from SoniaLopezBravo/known-issue/helm-stuck
Adding Helm stuck state to known issues
2 parents eb5a213 + 4c0758a commit 10a39a9

File tree

1 file changed

+63
-3
lines changed

1 file changed

+63
-3
lines changed

articles/iot-operations/troubleshoot/known-issues.md

Lines changed: 63 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4,9 +4,7 @@ description: Known issues for the MQTT broker, Layered Network Management (previ
44
author: dominicbetts
55
ms.author: dobett
66
ms.topic: troubleshooting-known-issue
7-
ms.custom:
8-
- ignite-2023
9-
ms.date: 03/05/2025
7+
ms.date: 03/24/2025
108
---
119

1210
# Known issues: Azure IoT Operations
@@ -21,6 +19,64 @@ This article lists the known issues for Azure IoT Operations.
2119

2220
- If you deploy Azure IoT Operations in GitHub Codespaces, shutting down and restarting the Codespace causes a `This codespace is currently running in recovery mode due to a configuration error.` issue. Currently, there's no workaround for the issue. If you need a cluster that supports shutting down and restarting, choose one of the options in [Prepare your Azure Arc-enabled Kubernetes cluster](../deploy-iot-ops/howto-prepare-cluster.md).
2321

22+
## Update issues
23+
24+
The following issues might occur when you update Azure IoT Operations.
25+
26+
### Helm package enters a stuck state
27+
28+
When you update Azure IoT Operations, the Helm package might enter a stuck state, preventing any helm install or upgrade operations from proceeding. This results in the following error message, which blocks further upgrades.
29+
30+
```output
31+
Message: Update failed for this resource, as there is a conflicting operation in progress. Please try after sometime.
32+
```
33+
34+
Follow these steps to resolve the issue.
35+
36+
1. Identify the stuck components by running the following command:
37+
38+
```sh
39+
helm list -n azure-iot-operations --pending
40+
```
41+
In the output, look for the release name of components, `<component-release-name>`, which have a status of `pending-upgrade` or `pending-install`. The following components might be affected by this issue:
42+
43+
- `-adr`
44+
- `-akri`
45+
- `-connectors`
46+
- `-mqttbroker`
47+
- `-dataflows`
48+
- `-schemaregistry`
49+
50+
1. Using the `<component-release-name>` from step 1, retrieve the revision history of the stuck release. You need to run the following command for **each component from step 1**. For example, if components `-adr` and `-mqttbroker` are stuck, you run the following command twice, once for each component:
51+
52+
```sh
53+
helm history <component-release-name> -n azure-iot-operations
54+
```
55+
56+
Make sure to replace `<component-release-name>` with the release name of the components that are stuck. In the output, look for the last revision that has a status of `Deployed` or `Superseded` and note the revision number.
57+
58+
1. Using the **revision number from step 2**, roll back the Helm release to the last successful revision. You need to run the following command for each component, `<component-release-name>`, and its revision number, `<revision-number>`, from steps 1 and 2.
59+
60+
```sh
61+
helm rollback <component-release-name> <revision-number> -n azure-iot-operations
62+
```
63+
64+
> [!IMPORTANT]
65+
> You need to repeat steps 2 and 3 for each component that is stuck. You reattempt the upgrade only after all components are rolled back to the last successful revision.
66+
67+
1. After the rollback of each component is complete, reattempt the upgrade using the following command:
68+
69+
```sh
70+
az iot ops update
71+
```
72+
73+
If you receive a message stating `Nothing to upgrade or upgrade complete`, force the upgrade by appending:
74+
75+
```sh
76+
az iot ops upgrade ....... --release-train stable --version 1.0.15
77+
```
78+
79+
2480
## MQTT broker
2581

2682
- Sometimes, the MQTT broker's memory usage can become unexpectedly high due to internal certificate rotation retries. This results in errors like 'failed to connect trace upload task to diagnostics service endpoint' in the logs. The issue is expected to be addressed in the next patch update. In the meantime, as a workaround, restart each broker pod one by one (including the diagnostic service, probe, and authentication service), making sure each backend recovers before moving on. Alternatively, [redeploy Azure IoT Operations with higher internal certificate duration](../manage-mqtt-broker/howto-encrypt-internal-traffic.md#internal-certificates), `1500h` or more.
@@ -135,3 +191,7 @@ kubectl delete pod aio-opc-opc.tcp-1-f95d76c54-w9v9c -n azure-iot-operations
135191
If you see both log entries from the two *kubectl log* commands, the cert-manager wasn't ready or running.
136192
1. Run `kubectl delete pod aio-dataflow-operator-0 -n azure-iot-operations` to delete the data flow operator pod. Deleting the pod clears the crash status and restarts the pod.
137193
1. Wait for the operator pod to restart and deploy the data flow.
194+
195+
196+
197+

0 commit comments

Comments
 (0)