Skip to content

feat(ai-proxy-multi): support client-driven model selection via models request body field#13084

Open
kjprice wants to merge 1 commit intoapache:masterfrom
kjprice:feat/client-model-preference
Open

feat(ai-proxy-multi): support client-driven model selection via models request body field#13084
kjprice wants to merge 1 commit intoapache:masterfrom
kjprice:feat/client-model-preference

Conversation

@kjprice
Copy link

@kjprice kjprice commented Mar 10, 2026

Description

When allow_client_model_preference is enabled on the plugin config, clients can include a models array in the request body to specify their preferred model/instance ordering. This enables multiple teams sharing a single gateway to express different model preferences without requiring separate routes.

Changes

Schema (apisix/plugins/ai-proxy/schema.lua):

  • Added allow_client_model_preference boolean field (default: false) to ai_proxy_multi_schema

Plugin logic (apisix/plugins/ai-proxy-multi.lua):

  • match_client_models() — matches client models entries against configured instances by model name and optionally provider
  • pick_preferred_instance() — sequential picker that respects client ordering with rate-limiting awareness
  • Modified access() to read request body, extract models, reorder instances, and strip models before forwarding
  • Modified retry_on_error() to fall back through client-preferred order on HTTP 429/5xx

Request body models field supports:

  • String shorthand: ["gpt-4", "deepseek-chat"]
  • Object form: [{"provider": "openai", "model": "gpt-4"}]
  • Mixed: both in the same array

Behavior:

  • Unrecognized model entries are silently ignored
  • Instances not in the client's list are appended in original priority order
  • models field is always stripped before forwarding upstream
  • When disabled (default), models field is ignored — fully backward compatible

Docs (docs/en/latest/plugins/ai-proxy-multi.md):

  • Added allow_client_model_preference to attributes table
  • Added models to request format table
  • Added "Client-Driven Model Selection" example section

Tests (t/plugin/ai-proxy-multi.client-model-preference.t):

  • Schema validation (default false, explicit true)
  • String shorthand model preference
  • Object form model preference
  • Fallback to server priority without models field
  • Unrecognized models ignored
  • models field stripped from forwarded request
  • Feature disabled when allow_client_model_preference is false

Resolves #13083

…s request body field

When allow_client_model_preference is enabled, clients can include a
models array in the request body to specify preferred model ordering.
Each element can be a model name string or an object with provider and
model fields. The plugin matches entries against configured instances
and reorders instance selection accordingly.

Resolves apache#13083
@dosubot dosubot bot added size:XL This PR changes 500-999 lines, ignoring generated files. enhancement New feature or request labels Mar 10, 2026
@Baoyuantop Baoyuantop requested a review from Copilot March 11, 2026 03:17
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds an opt-in feature to ai-proxy-multi that allows clients to influence AI instance selection by providing a models array in the JSON request body, enabling per-client model/provider preference ordering while keeping instance configuration and auth server-controlled.

Changes:

  • Adds allow_client_model_preference (default false) to the ai-proxy-multi plugin schema.
  • Implements client-driven instance reordering and preference-aware retry/fallback behavior in ai-proxy-multi.
  • Documents the new configuration and request format, and adds a dedicated test suite for the feature.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.

File Description
apisix/plugins/ai-proxy/schema.lua Adds the allow_client_model_preference schema field to opt into the feature.
apisix/plugins/ai-proxy-multi.lua Implements request-body parsing for models, preferred-instance picking, and retry handling.
docs/en/latest/plugins/ai-proxy-multi.md Documents the new attribute and request models field with examples.
t/plugin/ai-proxy-multi.client-model-preference.t Adds test coverage for schema validation, ordering, ignore behavior, and stripping of models.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +303 to +309
local function match_client_models(instances, models)
local ordered_names = {}
local matched = {}

for _, model_pref in ipairs(models) do
local target_model, target_provider
if type(model_pref) == "string" then
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

match_client_models() assumes models is an array and iterates it with ipairs(models). If a client sends a non-array JSON value (e.g. "models":"gpt-4"), ipairs will raise an error and the request will 500.

Suggestion: validate request_body.models is a table/array before calling match_client_models (and/or harden match_client_models to return the default ordering when type(models) ~= "table"). Also consider skipping elements that are not a string or object.

Copilot uses AI. Check for mistakes.
Comment on lines +499 to +507
local name, ai_instance, err = pick_preferred_instance(ctx, conf)
if err then
return 503, err
end
ctx.picked_ai_instance_name = name
ctx.picked_ai_instance = ai_instance
ctx.balancer_ip = name
ctx.bypass_nginx_upstream = true
return
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When client preference is used, instance selection bypasses pick_target() and therefore bypasses the existing active health-check filtering done via fetch_health_instances()/healthcheck_manager. This means a client can force the plugin to try an instance that health checks currently mark as down, changing behavior compared to the server-driven picker.

Suggestion: apply the same health-check availability filtering to the preferred list (e.g. build an allowlist of healthy instance names before iterating, or reuse the existing picker/checker status logic) so client-driven ordering doesn't ignore configured health checks.

Suggested change
local name, ai_instance, err = pick_preferred_instance(ctx, conf)
if err then
return 503, err
end
ctx.picked_ai_instance_name = name
ctx.picked_ai_instance = ai_instance
ctx.balancer_ip = name
ctx.bypass_nginx_upstream = true
return
-- apply health-check availability filtering to the preferred list
local healthy_instances, health_err = fetch_health_instances(conf, ctx)
local use_client_preference = true
if not healthy_instances then
core.log.warn("failed to fetch healthy instances for client model preference: ",
health_err, ", falling back to balancer selection")
use_client_preference = false
else
local healthy_set = {}
for _, inst in ipairs(healthy_instances) do
if inst.name then
healthy_set[inst.name] = true
end
end
local filtered_preference = {}
for _, pref in ipairs(ctx.client_model_preference) do
if pref.instance_name and healthy_set[pref.instance_name] then
filtered_preference[#filtered_preference + 1] = pref
end
end
if #filtered_preference == 0 then
core.log.warn("no healthy instances match client model preference; ",
"falling back to balancer selection")
use_client_preference = false
else
ctx.client_model_preference = filtered_preference
end
end
if use_client_preference then
local name, ai_instance, err = pick_preferred_instance(ctx, conf)
if err then
return 503, err
end
ctx.picked_ai_instance_name = name
ctx.picked_ai_instance = ai_instance
ctx.balancer_ip = name
ctx.bypass_nginx_upstream = true
return
end

Copilot uses AI. Check for mistakes.
Comment on lines +335 to +340
for _, instance in ipairs(instances) do
if not matched[instance.name] then
core.table.insert(ordered_names, instance.name)
end
end

Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fallback order appended for instances not listed by the client is based on the raw conf.instances array order. However, the server-driven picker uses priority-based ordering (see priority_balancer sorting by instance.priority). If conf.instances is not already sorted, this can cause lower-priority instances to be tried before higher-priority ones when the client omits them.

Suggestion: when appending unmatched instances, order them consistently with the server-side priority behavior (e.g. sort unmatched by instance.priority descending, and keep stable order within the same priority).

Suggested change
for _, instance in ipairs(instances) do
if not matched[instance.name] then
core.table.insert(ordered_names, instance.name)
end
end
local unmatched = {}
for idx, instance in ipairs(instances) do
if not matched[instance.name] then
core.table.insert(unmatched, {
name = instance.name,
priority = instance.priority or 0,
index = idx,
})
end
end
table.sort(unmatched, function(a, b)
if a.priority == b.priority then
return a.index < b.index
end
return a.priority > b.priority
end)
for _, item in ipairs(unmatched) do
core.table.insert(ordered_names, item.name)
end

Copilot uses AI. Check for mistakes.
Comment on lines 530 to +548
local function retry_on_error(ctx, conf, code)
if ctx.client_model_preference then
ctx.client_model_tried = ctx.client_model_tried or {}
ctx.client_model_tried[ctx.picked_ai_instance_name] = true
if (code == 429 and fallback_strategy_has(conf.fallback_strategy, "http_429")) or
(code >= 500 and code < 600 and
fallback_strategy_has(conf.fallback_strategy, "http_5xx")) then
local name, ai_instance, err = pick_preferred_instance(ctx, conf)
if err then
core.log.error("all preferred instances failed: ", err)
return 502
end
ctx.balancer_ip = name
ctx.picked_ai_instance_name = name
ctx.picked_ai_instance = ai_instance
return
end
return code
end
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new client-preference retry path in retry_on_error() (falling back through ctx.client_model_preference on 429/5xx) isn’t covered by the added tests. Current test cases validate selection/reordering and stripping, but not that a 429/5xx from the preferred instance causes the plugin to retry the next preferred instance (and that it stops retrying on non-matching status codes).

Suggestion: add a test that makes the first preferred instance return 429/5xx and asserts the response comes from the next preferred instance (similar to existing fallback tests in ai-proxy-multi.balancer.t).

Copilot uses AI. Check for mistakes.
Comment on lines +483 to +496
if conf.allow_client_model_preference then
local body, err = core.request.get_body()
if body then
local request_body, decode_err = core.json.decode(body)
if request_body and not decode_err and request_body.models then
ctx.client_model_preference = match_client_models(
conf.instances, request_body.models)
core.log.info("client model preference: ",
core.json.delay_encode(ctx.client_model_preference))
request_body.models = nil
ngx.req.set_body_data(core.json.encode(request_body))
end
end
end
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The models field is only stripped when allow_client_model_preference is enabled. This contradicts the PR description/docs (and the new tests) which state that models is always stripped before forwarding upstream. As-is, requests sent with models while the feature is disabled will forward an extra models field to upstream providers.

Suggestion: always remove models from the JSON request body when present, but only apply client-driven reordering when allow_client_model_preference is true (i.e., strip unconditionally; reorder conditionally).

Copilot uses AI. Check for mistakes.
@Baoyuantop
Copy link
Contributor

Hi @kjprice, please fix failed CI

@Baoyuantop Baoyuantop added the wait for update wait for the author's response in this issue/PR label Mar 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request size:XL This PR changes 500-999 lines, ignoring generated files. wait for update wait for the author's response in this issue/PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: support client-driven model selection in ai-proxy-multi via models request body field

3 participants