Skip to content

Conversation

michalpristas
Copy link
Contributor

@michalpristas michalpristas commented Sep 1, 2025

This PR is huge but it does just few things.

It does not use elastic-agent-libs for calling ProcessWindowsControlEvents. ProcessWindowsControlEvents creates an objects that serves as an service instance and communicates with service manager. But having this in libs has an effect of running init section of whole elastic-agent-libs dependency tree before registering with a service.

It splits agent/cmd package into sub packages isolating each command. Service Manager communication is then moved outside to internal/pkg/agent/agentservice as described in first step. Having this split and service in a package deliberately named makes init section of agentservice being called sooner during initialization.

In agentservice init service we add WaitGroup when spinning up ProcessWindowsControlEvents goroutine. This blocks loading of subsequent packages and avoids possibility of starving ourselves of resources (when new goroutines are started later, subprocesses...). We cannot guarantee ordering of goroutines and that this one will be up in time. This is best effort of achieving that.
It's essential to have a package named like this because of the way how init section loading is implemented (alphabetical sorting plays a significant role)

Result of this is that we not just moved communication a way sooner to init section (normally end of init due to big dependency tree)
But also moved it before proceeding with initialization of

  • composables
  • azure, aws sdks,
  • k8s,
  • prometheus. and cockroachdb,
  • >95% of otel dependencies,
  • beats

This makes separation of otel into custom binary not critical for issues related to windows service manager communication timeouts.

Related #4971

@michalpristas michalpristas self-assigned this Sep 1, 2025
@michalpristas michalpristas added bug Something isn't working Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team backport-skip skip-changelog labels Sep 1, 2025
Copy link

Quality Gate failed Quality Gate failed

Failed conditions
13.8% Coverage on New Code (required ≥ 40%)

See analysis details on SonarQube

@elasticmachine
Copy link
Collaborator

💛 Build succeeded, but was flaky

Failed CI Steps

History

cc @michalpristas

@michalpristas michalpristas marked this pull request as ready for review September 2, 2025 10:25
@michalpristas michalpristas requested a review from a team as a code owner September 2, 2025 10:25
@elasticmachine
Copy link
Collaborator

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

@ebeahan ebeahan removed the request for review from nkvoll September 2, 2025 15:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-skip bug Something isn't working skip-changelog Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants