[8.19] (backport #10650) fix: zombie processes during restart #10815
+82
−12
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What does this PR do?
This PR fixes zombie/defunct processes that are left behind when Elastic Agent re-executes itself during restart. The fix involves:
Wait()is calledWhy is it important?
Root Cause
When the Elastic Agent re-executes itself during restart, the following sequence occurs:
execveitselfexecve, all threads other than the calling thread are destroyedPDeathSigmechanism we enable for subprocessesWhy This Affects EDOT More Than Beats
Beats subprocesses typically terminate almost immediately (within the 5-second window), so they don't become zombies. However, the EDOT collector's shutdown time seemed to be affected by:
Impact
This fix ensures proper process cleanup regardless of shutdown duration while maintaining graceful termination when possible.
Checklist
./changelog/fragmentsusing the changelog toolDisruptive User Impact
Users may notice:
How to test this PR locally
Run
TestMetricsMonitoringCorrectBinariesintegration testRelated issues
This is an automatic backport of pull request fix: zombie processes during restart #10650 done by Mergify.