diff --git a/troubleshoot/ingest/fleet/common-problems.md b/troubleshoot/ingest/fleet/common-problems.md index 0b87e8bfc0..985e3fc2e7 100644 --- a/troubleshoot/ingest/fleet/common-problems.md +++ b/troubleshoot/ingest/fleet/common-problems.md @@ -59,6 +59,7 @@ Find troubleshooting information for {{fleet}}, {{fleet-server}}, and {{agent}} * [Hosted {{agent}} is offline](#hosted-agent-offline) * [APM & {{fleet}} fails to upgrade to 8.x on {{ecloud}}](#hosted-agent-8-x-upgrade-fail) * [Air-gapped {{agent}} upgrade can fail due to an inaccessible PGP key](#pgp-key-download-fail) +* [{{agent}} upgrade fails on Windows with exit status `0xc0000142`](#agent-upgrade-fail-windows) * [{{agents}} are unable to connect after removing the {{fleet-server}} integration](#fleet-server-integration-removed) * [{{agent}} Out of Memory errors on Kubernetes](#agent-oom-k8s) * [Error when running {{agent}} commands with `sudo`](#agent-sudo-error) @@ -652,6 +653,50 @@ curl -u elastic: --request POST \ In versions 8.9 and above, an {{agent}} upgrade may fail when the upgrader can’t access a PGP key required to verify the binary signature. For details and a workaround, refer to the [PGP key download fails in an air-gapped environment](https://www.elastic.co/guide/en/fleet/8.9/release-notes-8.9.0.html#known-issue-3375) known issue in the version 8.9.0 Release Notes or to the [workaround documentation](https://github.com/elastic/elastic-agent/blob/main/docs/pgp-workaround.md) in the elastic-agent GitHub repository. +## {{agent}} upgrade fails on Windows with exit status `0xc0000142` [agent-upgrade-fail-windows] + +During an {{agent}} upgrade on Windows, {{agent}} spawns a "watcher" process that monitors the upgrade process. Windows attempts to create a temporary console for this process. If Windows can't create this console, the watcher process initialization fails with error code `0xc0000142` (`STATUS_DLL_INIT_FAILED`), resulting in an upgrade failure. {{agent}} logs this error at the `info` level. + +The error is caused by Windows [desktop heap exhaustion](https://learn.microsoft.com/en-us/troubleshoot/windows-server/performance/desktop-heap-limitation-out-of-memory). When {{agent}} runs as a [Windows service application](https://learn.microsoft.com/en-us/dotnet/framework/windows-services/introduction-to-windows-service-applications), it uses the service desktop, and shares the desktop heap with other running services. If a service process is using windowing resources, but is failing to release them, this may exhaust the desktop heap and affect {{agent}}. + +:::{note} +Interactively-run instances of `elastic-agent.exe` are not subject to this limitation. Only instances running as a service are potentially affected. +::: + +To resolve the issue, you can try the following: + +- **Update {{agent}} immediately after a system reboot** + + A system reboot destroys and recreates the desktop heap, resolving any prior exhaustion. + Because many memory leaks are gradual, updating {{agent}} immediately after a system reboot may allow {{agent}} to upgrade before the memory leaking application exhausts the desktop heap. + + :::{tip} + A [cold startup](https://learn.microsoft.com/en-us/windows-hardware/drivers/kernel/distinguishing-fast-startup-from-wake-from-hibernation) resets kernel memory, but a fast startup or a wake from hibernation does not. + A regular reboot (for example, `shutdown /r /t 0`) results in a cold startup, and resets the desktop heap. + ::: + +- **Update third-party service applications** + + As standard Windows tools such as Task Manager and Process Explorer do not attribute desktop heap usage by application, you have to consider updating all third-party processes that are running as a service. To list these applications, use the following PowerShell command: + + ```powershell + PS C:\> Get-Process | Where {$_.SI -eq 0} | Where {$_.MainModule.FileVersionInfo.ProductName -and (-not (($_.MainModule.FileVersionInfo.CompanyName -eq "Microsoft Corporation") -and ($_.MainModule.FileVersionInfo.ProductName -like "*Windows*"))) } | ForEach-Object { $_.MainModule.FileVersionInfo.ProductName + ' - ' + $_.Path } + ``` + + You can then install any updates from the listed applications' manufacturers. + +- **Terminate or uninstall third-party service applications** + + You can try terminating or uninstalling non-critical third-party service applications before updating {{agent}}. + Terminating a process releases its desktop heap resources. + + Note that the {{agent}} update process does not require a significant amount of desktop heap resources, so a successful {{agent}} update following the termination or uninstallation of a service application does not necessarily mean that the application was exhausting the desktop heap. + +- **Resize the desktop heap** + + As a short-term solution, follow the steps described in the [Microsoft guide](https://learn.microsoft.com/en-us/troubleshoot/windows-server/performance/desktop-heap-limitation-out-of-memory) to increase the size of the desktop heap. Note that if a service application is causing a memory leak, increasing the size of the desktop heap may only postpone the desktop heap exhaustion. + + ## {{agents}} are unable to connect after removing the {{fleet-server}} integration [fleet-server-integration-removed] When you use {{fleet}}-managed {{agent}}, at least one {{agent}} needs to be running the [{{fleet-server}} integration](https://docs.elastic.co/integrations/fleet_server). In case the policy containing this integration is accidentally removed from {{agent}}, all other agents will not be able to be managed. However, the {{agents}} will continue to send data to their configured output.