Skip to content

Commit 1d11874

Browse files
mergify[bot]intxgo
authored andcommitted
Doc: Add known issue 9.0.7: Agent stuck on failed upgrade (elastic#10042) (elastic#10070)
1 parent b88c164 commit 1d11874

File tree

1 file changed

+31
-0
lines changed

1 file changed

+31
-0
lines changed

docs/release-notes/known-issues.md

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,37 @@ Known issues are significant defects or limitations that may impact your impleme
2323
% Workaround description.
2424
% :::
2525

26+
27+
:::{dropdown} Failed upgrades leave {{agent}} stuck until restart
28+
29+
**Applies to: {{agent}} 8.18.7, 9.0.7**
30+
31+
On September 17, 2025, a known issue was discovered that can cause {{agent}} upgrades to get stuck if an upgrade attempt fails under specific conditions. This happens because the coordinator’s `overrideState` remains set, leaving the agent in a state that appears to be upgrading.
32+
33+
**Conditions**
34+
35+
This issue is triggered if the upgrade fails during one of the early checks inside `Coordinator.Upgrade`, for example:
36+
37+
- The agent is not upgradeable
38+
- Capabilities check denies the upgrade
39+
- When {{agent}} is tamper-protected, Endpoint must validate that the upgrade action was correctly signed by Kibana to allow the upgrade. If the signature is missing, invalid, or the connection between {{agent}} and Endpoint was interrupted, the validation fails. This causes the agent coordinator's override state to become stuck until the agent is restarted.
40+
41+
**Symptoms**
42+
43+
- {{fleet}} shows the upgrade action in progress, even though the upgrade remains stuck
44+
- No further upgrade attempts succeed
45+
- Elastic Agent status shows an override state indicating upgrade
46+
47+
**Workaround**
48+
49+
Restart the {{agent}} to clear the coordinator’s `overrideState` and allow new upgrade attempts to proceed.
50+
51+
**Resolution**
52+
This issue was fixed in [#9992](https://github.com/elastic/elastic-agent/pull/9992), which ensures that the coordinator clears its override state whenever an early failure occurs.
53+
54+
The fix is included in versions 9.1.4 and 8.19.4, and planned for versions 9.0.8 and 8.18.8.
55+
:::
56+
2657
:::{dropdown} [Windows] {{agent}} does not process Windows security events
2758

2859
**Applies to: {{agent}} 8.19.0, 9.1.0 (Windows only)**

0 commit comments

Comments
 (0)