Skip to content

Liveness endpoint does not consider overall agent state, only component state #9576

@cmacknz

Description

@cmacknz

In a container with a config referencing an undefined output:

elastic-agent status
┌─ fleet
│  └─ status: (STOPPED) Not enrolled into Fleet
└─ elastic-agent
   └─ status: (FAILED) Invalid component model: failed to render components: invalid 'inputs.0.use_output', references an unknown output 'iam'

The results from hitting /liveness show healthy:

# curl -w 'HTTP %{http_code}' 'http://localhost:6791/liveness?failon=heartbeat'
HTTP 200
# curl -w 'HTTP %{http_code}' 'http://localhost:6791/liveness?failon=degraded'
HTTP 200
# curl -w 'HTTP %{http_code}' 'http://localhost:6791/liveness?failon=failed'
HTTP 200

It looks like we aren't considering the overall agent status, only the component status:

unhealthyComponent := false
for _, comp := range state.Components {
if (failConfig.Failed && comp.State.State == client.UnitStateFailed) || (failConfig.Degraded && comp.State.State == client.UnitStateDegraded) {
unhealthyComponent = true
}
}
if state.Collector != nil {
if (failConfig.Failed && (otelhelpers.HasStatus(state.Collector, componentstatus.StatusFatalError) || otelhelpers.HasStatus(state.Collector, componentstatus.StatusPermanentError))) || (failConfig.Degraded && otelhelpers.HasStatus(state.Collector, componentstatus.StatusRecoverableError)) {
unhealthyComponent = true
}
}
// bias towards the coordinator check, since it can be otherwise harder to diagnose
if unhealthyComponent {
w.WriteHeader(http.StatusInternalServerError)
}

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions