fix(control-api): exclude deleted upstream nodes from health check API#13572
Open
shreemaan-abhishek wants to merge 2 commits into
Open
fix(control-api): exclude deleted upstream nodes from health check API#13572shreemaan-abhishek wants to merge 2 commits into
shreemaan-abhishek wants to merge 2 commits into
Conversation
The health check Control API (/v1/healthcheck) reads its node list straight from the health checker's shared dict, which is keyed only by the upstream id. When a node is removed from an upstream, its target can linger in that dict (during the delayed-clear window, or indefinitely if the cleanup sweep never runs for that checker), so the Control API keeps reporting nodes that are no longer part of the upstream. Reconcile the reported nodes against the upstream's current config and drop any target that is no longer configured. Fixes apache#13385
There was a problem hiding this comment.
Pull request overview
Fixes the Control API healthcheck endpoints (/v1/healthcheck and /v1/healthcheck/{src_type}/{src_id}) so they no longer report stale targets that were removed from an upstream but are still lingering in the healthchecker shared dict.
Changes:
- Reconciles
healthcheck.get_target_list()results against the upstream’s current configured node set and filters out deletedip:porttargets (with active-check port override support). - Adds a regression test that deletes an upstream node via Admin API and verifies the Control API no longer returns it.
- Updates Control API documentation to describe the new filtering behavior.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
apisix/control/v1.lua |
Filters stale healthcheck targets by comparing returned nodes against the current upstream configuration. |
t/control/healthcheck-stale-nodes.t |
Regression test proving deleted upstream nodes stop appearing in the Control API response. |
docs/en/latest/control-api.md |
Documents that removed upstream nodes are not reported by the healthcheck Control API. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+82
to
+85
| local active = up_conf.checks and up_conf.checks.active | ||
| local override_port = active and active.port | ||
| local targets = core.table.new(0, #nodes) | ||
|
|
| * name: resource id, where the health checker is reporting from. | ||
| * type: health check type: `["http", "https", "tcp"]`. | ||
| * nodes: target nodes of the health checker. | ||
| * nodes: target nodes of the health checker. Nodes removed from the upstream are not reported, even if their checker state has not been cleaned up yet. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
The health check Control API (
GET /v1/healthcheckand/v1/healthcheck/{src_type}/{src_id}) reads its node list straight from the health checker's shared dict viahealthcheck.get_target_list(). That dict is keyed only by the upstream id, not by the node set, so when a node is removed from an upstream its target can linger there:delayed_cleargrace window after the checker is recreated, andAs a result the Control API keeps reporting nodes that are no longer part of the upstream, which is what #13385 reports.
This patch reconciles the reported nodes against the upstream's current configuration in
extra_checker_info, dropping any target whoseip:portis not in the current node set (taking the active-check port override into account). Matching is fail-open: if the configured node set cannot be determined, the original list is returned unchanged so a live node is never hidden.Fixes #13385
Reproduce / verify
Added
t/control/healthcheck-stale-nodes.t: create an upstream with two nodes and an active health check, start the checker, delete one node via the Admin API, then query the Control API. Before the fix the deleted node is still returned; after the fix only the remaining node is reported.Checklist