Skip to content

fix(control-api): exclude deleted upstream nodes from health check API#13572

Open
shreemaan-abhishek wants to merge 2 commits into
apache:masterfrom
shreemaan-abhishek:fix/healthcheck-stale-nodes-control-api
Open

fix(control-api): exclude deleted upstream nodes from health check API#13572
shreemaan-abhishek wants to merge 2 commits into
apache:masterfrom
shreemaan-abhishek:fix/healthcheck-stale-nodes-control-api

Conversation

@shreemaan-abhishek

Copy link
Copy Markdown
Contributor

Description

The health check Control API (GET /v1/healthcheck and /v1/healthcheck/{src_type}/{src_id}) reads its node list straight from the health checker's shared dict via healthcheck.get_target_list(). That dict is keyed only by the upstream id, not by the node set, so when a node is removed from an upstream its target can linger there:

  • during the delayed_clear grace window after the checker is recreated, and
  • potentially indefinitely, since the cleanup sweep only runs for a checker that is present in the periodic-lock-holding worker's in-memory checker set.

As a result the Control API keeps reporting nodes that are no longer part of the upstream, which is what #13385 reports.

This patch reconciles the reported nodes against the upstream's current configuration in extra_checker_info, dropping any target whose ip:port is not in the current node set (taking the active-check port override into account). Matching is fail-open: if the configured node set cannot be determined, the original list is returned unchanged so a live node is never hidden.

Fixes #13385

Reproduce / verify

Added t/control/healthcheck-stale-nodes.t: create an upstream with two nodes and an active health check, start the checker, delete one node via the Admin API, then query the Control API. Before the fix the deleted node is still returned; after the fix only the remaining node is reported.

Checklist

  • I have explained the need for this PR and the problem it solves
  • I have explained the changes or the new features added to this PR
  • I have added tests corresponding to this change
  • I have updated the documentation to reflect this change
  • I have verified that this change is backward compatible

The health check Control API (/v1/healthcheck) reads its node list
straight from the health checker's shared dict, which is keyed only by
the upstream id. When a node is removed from an upstream, its target can
linger in that dict (during the delayed-clear window, or indefinitely if
the cleanup sweep never runs for that checker), so the Control API keeps
reporting nodes that are no longer part of the upstream.

Reconcile the reported nodes against the upstream's current config and
drop any target that is no longer configured.

Fixes apache#13385
@dosubot dosubot Bot added size:L This PR changes 100-499 lines, ignoring generated files. bug Something isn't working labels Jun 18, 2026

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes the Control API healthcheck endpoints (/v1/healthcheck and /v1/healthcheck/{src_type}/{src_id}) so they no longer report stale targets that were removed from an upstream but are still lingering in the healthchecker shared dict.

Changes:

  • Reconciles healthcheck.get_target_list() results against the upstream’s current configured node set and filters out deleted ip:port targets (with active-check port override support).
  • Adds a regression test that deletes an upstream node via Admin API and verifies the Control API no longer returns it.
  • Updates Control API documentation to describe the new filtering behavior.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
apisix/control/v1.lua Filters stale healthcheck targets by comparing returned nodes against the current upstream configuration.
t/control/healthcheck-stale-nodes.t Regression test proving deleted upstream nodes stop appearing in the Control API response.
docs/en/latest/control-api.md Documents that removed upstream nodes are not reported by the healthcheck Control API.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread apisix/control/v1.lua
Comment on lines +82 to +85
local active = up_conf.checks and up_conf.checks.active
local override_port = active and active.port
local targets = core.table.new(0, #nodes)

* name: resource id, where the health checker is reporting from.
* type: health check type: `["http", "https", "tcp"]`.
* nodes: target nodes of the health checker.
* nodes: target nodes of the health checker. Nodes removed from the upstream are not reported, even if their checker state has not been cleaned up yet.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working size:L This PR changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: After a node is deleted from the upstream, its information can still be queried from the health check interface.

2 participants