Skip to content

[beats receivers] Collector sub-process shutdown timeout needs to be configurable and contextual #10786

@cmacknz

Description

@cmacknz

The coordinator waits up to 5s for the managers to exit:

// managerShutdownTimeout is how long the coordinator will wait during shutdown
// to receive termination states from its managers.
const managerShutdownTimeout = time.Second * 5

In #10650 a new collector stop timeout was introduced at 3s long to guarantee the collector sub-process was killed without failing to wait for it, avoiding a defunct zombie process.

Similarly, the default stop timeout for other sub-processes is 30s, and it is by coincidence that existing sub-processes exit faster than this:

StopTimeout: 30 * time.Second,

The amount of time to wait before killing sub-processes needs to be configurable by the user, and in most cases set to a longer value like the existing 30s (like the collector timeout originally was) to allow for graceful shutdown and for final data to be shipped. The collector is currently special cased to a shorter timeout because it has to restart whenever it's configuration changes as of the time of writing.

However, in certain circumstances waiting this long is inconveniencing, for example when enrolling an Elastic Agent to Fleet it restarts itself and would incur this shutdown delay. In this case the existing configuration is being discarded and agent can restart immediately.

Acceptance Criteria

  • The shutdown timeout for all sub-processes needs to be configurable by the user in elastic-agent.yml.
  • Cases where graceful shutdown is not necessary should use the lowest timeout possible by default. For example, when enrolling Elastic Agent into a new agent policy.
  • The increased shutdown delay must not cause unexpected impact to other parts of agent that re-execute as part of its implementation, particularly the upgrade process.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions