-
Notifications
You must be signed in to change notification settings - Fork 2.8k
fix: (ai-proxy-multi) health check not work #12968
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -35,7 +35,7 @@ local endpoint_regex = "^(https?)://([^:/]+):?(%d*)/?.*$" | |||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||
| local pickers = {} | ||||||||||||||||||||||||||||||||||||||||||||
| local lrucache_server_picker = core.lrucache.new({ | ||||||||||||||||||||||||||||||||||||||||||||
| ttl = 300, count = 256 | ||||||||||||||||||||||||||||||||||||||||||||
| ttl = 10, count = 256 | ||||||||||||||||||||||||||||||||||||||||||||
| }) | ||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||
| local plugin_name = "ai-proxy-multi" | ||||||||||||||||||||||||||||||||||||||||||||
|
|
@@ -107,13 +107,6 @@ function _M.check_schema(conf) | |||||||||||||||||||||||||||||||||||||||||||
| core.log.warn("fail to require ai provider: ", instance.provider, ", err", err) | ||||||||||||||||||||||||||||||||||||||||||||
| return false, "ai provider: " .. instance.provider .. " is not supported." | ||||||||||||||||||||||||||||||||||||||||||||
| end | ||||||||||||||||||||||||||||||||||||||||||||
| local sa_json = core.table.try_read_attr(instance, "auth", "gcp", "service_account_json") | ||||||||||||||||||||||||||||||||||||||||||||
| if sa_json then | ||||||||||||||||||||||||||||||||||||||||||||
| local _, err = core.json.decode(sa_json) | ||||||||||||||||||||||||||||||||||||||||||||
| if err then | ||||||||||||||||||||||||||||||||||||||||||||
| return false, "invalid gcp service_account_json: " .. err | ||||||||||||||||||||||||||||||||||||||||||||
| end | ||||||||||||||||||||||||||||||||||||||||||||
| end | ||||||||||||||||||||||||||||||||||||||||||||
| end | ||||||||||||||||||||||||||||||||||||||||||||
| local algo = core.table.try_read_attr(conf, "balancer", "algorithm") | ||||||||||||||||||||||||||||||||||||||||||||
| local hash_on = core.table.try_read_attr(conf, "balancer", "hash_on") | ||||||||||||||||||||||||||||||||||||||||||||
|
|
@@ -177,7 +170,9 @@ local function parse_domain_for_node(node) | |||||||||||||||||||||||||||||||||||||||||||
| end | ||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||
| local function resolve_endpoint(instance_conf) | ||||||||||||||||||||||||||||||||||||||||||||
| -- Calculate DNS node from instance config without modifying the input | ||||||||||||||||||||||||||||||||||||||||||||
| -- Returns a node table with host, port, scheme fields | ||||||||||||||||||||||||||||||||||||||||||||
|
Comment on lines
+173
to
+174
|
||||||||||||||||||||||||||||||||||||||||||||
| -- Calculate DNS node from instance config without modifying the input | |
| -- Returns a node table with host, port, scheme fields | |
| -- Calculate DNS node from instance config without modifying the input. | |
| -- Intended for use when _dns_value is not available (e.g., when called | |
| -- from timer context) to recompute the target node. | |
| -- Returns a node table with host, port, scheme fields. |
Copilot
AI
Feb 13, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The removed code handled ai_driver.get_node() which appears to be a valid driver interface. This removal could break custom AI drivers that implement get_node(). Consider preserving this logic or documenting why it was removed.
| -- built-in ai driver always use https | |
| scheme = "https" | |
| host = ai_driver.host | |
| port = ai_driver.port | |
| -- built-in ai driver always use https; custom drivers may implement get_node() | |
| if ai_driver.get_node then | |
| local driver_node = ai_driver.get_node(instance_conf) | |
| if driver_node then | |
| scheme = driver_node.scheme or "https" | |
| host = driver_node.host | |
| port = driver_node.port | |
| else | |
| scheme = "https" | |
| host = ai_driver.host | |
| port = ai_driver.port | |
| end | |
| else | |
| scheme = "https" | |
| host = ai_driver.host | |
| port = ai_driver.port | |
| end |
Copilot
AI
Feb 13, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Defaulting to healthy when state is not found contradicts the stated goal of excluding unhealthy instances. This could allow requests to uninitialized instances. Consider defaulting to unhealthy until the first health check completes, or document why healthy is the correct default.
| -- State not found in SHM (checker not yet created), default to healthy | |
| core.log.warn("[SHM-DIRECT] state not found for instance=", instance_name, ", defaulting to healthy") | |
| return true, nil | |
| -- State not found in SHM (checker not yet created), default to unhealthy to avoid routing to uninitialized instances | |
| core.log.warn("[SHM-DIRECT] state not found for instance=", instance_name, ", defaulting to unhealthy") | |
| return false, nil |
Copilot
AI
Feb 13, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
next(checkers) returns (key, value), so checker_ref is actually the key (instance name), not the checker object. This code attempts to access checker_ref.shm which would fail. Should be: local _, checker_ref = next(checkers)
| local checker_ref = checkers and next(checkers) | |
| local _, checker_ref = checkers and next(checkers) |
Copilot
AI
Feb 13, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The array index calculation (i - 1) suggests Lua's 1-based indexing is being converted to 0-based. Add a comment explaining this is for compatibility with the checker naming convention, as it's not immediately obvious why the adjustment is needed.
| if checker_ref and checker_ref.shm then | |
| if checker_ref and checker_ref.shm then | |
| -- Note: (i - 1) converts Lua's 1-based index to 0-based to match the | |
| -- existing checker naming convention used when the health checker is created. |
Copilot
AI
Feb 13, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variable name upstream_node shadows the outer scope variable node. While functionally correct, this could be confusing. Consider renaming to something like node_config to distinguish the constructed upstream node from the DNS-resolved node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The TTL reduction from 300 to 10 seconds significantly increases SHM access frequency. Consider documenting why this aggressive TTL is needed, especially since status_ver changes should already invalidate the cache when health states change.