feat(ai-proxy-multi): support client-driven model selection via models request body field#13084
feat(ai-proxy-multi): support client-driven model selection via models request body field#13084kjprice wants to merge 1 commit intoapache:masterfrom
Conversation
…s request body field When allow_client_model_preference is enabled, clients can include a models array in the request body to specify preferred model ordering. Each element can be a model name string or an object with provider and model fields. The plugin matches entries against configured instances and reorders instance selection accordingly. Resolves apache#13083
There was a problem hiding this comment.
Pull request overview
This PR adds an opt-in feature to ai-proxy-multi that allows clients to influence AI instance selection by providing a models array in the JSON request body, enabling per-client model/provider preference ordering while keeping instance configuration and auth server-controlled.
Changes:
- Adds
allow_client_model_preference(defaultfalse) to theai-proxy-multiplugin schema. - Implements client-driven instance reordering and preference-aware retry/fallback behavior in
ai-proxy-multi. - Documents the new configuration and request format, and adds a dedicated test suite for the feature.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
apisix/plugins/ai-proxy/schema.lua |
Adds the allow_client_model_preference schema field to opt into the feature. |
apisix/plugins/ai-proxy-multi.lua |
Implements request-body parsing for models, preferred-instance picking, and retry handling. |
docs/en/latest/plugins/ai-proxy-multi.md |
Documents the new attribute and request models field with examples. |
t/plugin/ai-proxy-multi.client-model-preference.t |
Adds test coverage for schema validation, ordering, ignore behavior, and stripping of models. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| local function match_client_models(instances, models) | ||
| local ordered_names = {} | ||
| local matched = {} | ||
|
|
||
| for _, model_pref in ipairs(models) do | ||
| local target_model, target_provider | ||
| if type(model_pref) == "string" then |
There was a problem hiding this comment.
match_client_models() assumes models is an array and iterates it with ipairs(models). If a client sends a non-array JSON value (e.g. "models":"gpt-4"), ipairs will raise an error and the request will 500.
Suggestion: validate request_body.models is a table/array before calling match_client_models (and/or harden match_client_models to return the default ordering when type(models) ~= "table"). Also consider skipping elements that are not a string or object.
| local name, ai_instance, err = pick_preferred_instance(ctx, conf) | ||
| if err then | ||
| return 503, err | ||
| end | ||
| ctx.picked_ai_instance_name = name | ||
| ctx.picked_ai_instance = ai_instance | ||
| ctx.balancer_ip = name | ||
| ctx.bypass_nginx_upstream = true | ||
| return |
There was a problem hiding this comment.
When client preference is used, instance selection bypasses pick_target() and therefore bypasses the existing active health-check filtering done via fetch_health_instances()/healthcheck_manager. This means a client can force the plugin to try an instance that health checks currently mark as down, changing behavior compared to the server-driven picker.
Suggestion: apply the same health-check availability filtering to the preferred list (e.g. build an allowlist of healthy instance names before iterating, or reuse the existing picker/checker status logic) so client-driven ordering doesn't ignore configured health checks.
| local name, ai_instance, err = pick_preferred_instance(ctx, conf) | |
| if err then | |
| return 503, err | |
| end | |
| ctx.picked_ai_instance_name = name | |
| ctx.picked_ai_instance = ai_instance | |
| ctx.balancer_ip = name | |
| ctx.bypass_nginx_upstream = true | |
| return | |
| -- apply health-check availability filtering to the preferred list | |
| local healthy_instances, health_err = fetch_health_instances(conf, ctx) | |
| local use_client_preference = true | |
| if not healthy_instances then | |
| core.log.warn("failed to fetch healthy instances for client model preference: ", | |
| health_err, ", falling back to balancer selection") | |
| use_client_preference = false | |
| else | |
| local healthy_set = {} | |
| for _, inst in ipairs(healthy_instances) do | |
| if inst.name then | |
| healthy_set[inst.name] = true | |
| end | |
| end | |
| local filtered_preference = {} | |
| for _, pref in ipairs(ctx.client_model_preference) do | |
| if pref.instance_name and healthy_set[pref.instance_name] then | |
| filtered_preference[#filtered_preference + 1] = pref | |
| end | |
| end | |
| if #filtered_preference == 0 then | |
| core.log.warn("no healthy instances match client model preference; ", | |
| "falling back to balancer selection") | |
| use_client_preference = false | |
| else | |
| ctx.client_model_preference = filtered_preference | |
| end | |
| end | |
| if use_client_preference then | |
| local name, ai_instance, err = pick_preferred_instance(ctx, conf) | |
| if err then | |
| return 503, err | |
| end | |
| ctx.picked_ai_instance_name = name | |
| ctx.picked_ai_instance = ai_instance | |
| ctx.balancer_ip = name | |
| ctx.bypass_nginx_upstream = true | |
| return | |
| end |
| for _, instance in ipairs(instances) do | ||
| if not matched[instance.name] then | ||
| core.table.insert(ordered_names, instance.name) | ||
| end | ||
| end | ||
|
|
There was a problem hiding this comment.
The fallback order appended for instances not listed by the client is based on the raw conf.instances array order. However, the server-driven picker uses priority-based ordering (see priority_balancer sorting by instance.priority). If conf.instances is not already sorted, this can cause lower-priority instances to be tried before higher-priority ones when the client omits them.
Suggestion: when appending unmatched instances, order them consistently with the server-side priority behavior (e.g. sort unmatched by instance.priority descending, and keep stable order within the same priority).
| for _, instance in ipairs(instances) do | |
| if not matched[instance.name] then | |
| core.table.insert(ordered_names, instance.name) | |
| end | |
| end | |
| local unmatched = {} | |
| for idx, instance in ipairs(instances) do | |
| if not matched[instance.name] then | |
| core.table.insert(unmatched, { | |
| name = instance.name, | |
| priority = instance.priority or 0, | |
| index = idx, | |
| }) | |
| end | |
| end | |
| table.sort(unmatched, function(a, b) | |
| if a.priority == b.priority then | |
| return a.index < b.index | |
| end | |
| return a.priority > b.priority | |
| end) | |
| for _, item in ipairs(unmatched) do | |
| core.table.insert(ordered_names, item.name) | |
| end |
| local function retry_on_error(ctx, conf, code) | ||
| if ctx.client_model_preference then | ||
| ctx.client_model_tried = ctx.client_model_tried or {} | ||
| ctx.client_model_tried[ctx.picked_ai_instance_name] = true | ||
| if (code == 429 and fallback_strategy_has(conf.fallback_strategy, "http_429")) or | ||
| (code >= 500 and code < 600 and | ||
| fallback_strategy_has(conf.fallback_strategy, "http_5xx")) then | ||
| local name, ai_instance, err = pick_preferred_instance(ctx, conf) | ||
| if err then | ||
| core.log.error("all preferred instances failed: ", err) | ||
| return 502 | ||
| end | ||
| ctx.balancer_ip = name | ||
| ctx.picked_ai_instance_name = name | ||
| ctx.picked_ai_instance = ai_instance | ||
| return | ||
| end | ||
| return code | ||
| end |
There was a problem hiding this comment.
The new client-preference retry path in retry_on_error() (falling back through ctx.client_model_preference on 429/5xx) isn’t covered by the added tests. Current test cases validate selection/reordering and stripping, but not that a 429/5xx from the preferred instance causes the plugin to retry the next preferred instance (and that it stops retrying on non-matching status codes).
Suggestion: add a test that makes the first preferred instance return 429/5xx and asserts the response comes from the next preferred instance (similar to existing fallback tests in ai-proxy-multi.balancer.t).
| if conf.allow_client_model_preference then | ||
| local body, err = core.request.get_body() | ||
| if body then | ||
| local request_body, decode_err = core.json.decode(body) | ||
| if request_body and not decode_err and request_body.models then | ||
| ctx.client_model_preference = match_client_models( | ||
| conf.instances, request_body.models) | ||
| core.log.info("client model preference: ", | ||
| core.json.delay_encode(ctx.client_model_preference)) | ||
| request_body.models = nil | ||
| ngx.req.set_body_data(core.json.encode(request_body)) | ||
| end | ||
| end | ||
| end |
There was a problem hiding this comment.
The models field is only stripped when allow_client_model_preference is enabled. This contradicts the PR description/docs (and the new tests) which state that models is always stripped before forwarding upstream. As-is, requests sent with models while the feature is disabled will forward an extra models field to upstream providers.
Suggestion: always remove models from the JSON request body when present, but only apply client-driven reordering when allow_client_model_preference is true (i.e., strip unconditionally; reorder conditionally).
|
Hi @kjprice, please fix failed CI |
Description
When
allow_client_model_preferenceis enabled on the plugin config, clients can include amodelsarray in the request body to specify their preferred model/instance ordering. This enables multiple teams sharing a single gateway to express different model preferences without requiring separate routes.Changes
Schema (
apisix/plugins/ai-proxy/schema.lua):allow_client_model_preferenceboolean field (default:false) toai_proxy_multi_schemaPlugin logic (
apisix/plugins/ai-proxy-multi.lua):match_client_models()— matches clientmodelsentries against configured instances by model name and optionally providerpick_preferred_instance()— sequential picker that respects client ordering with rate-limiting awarenessaccess()to read request body, extractmodels, reorder instances, and stripmodelsbefore forwardingretry_on_error()to fall back through client-preferred order on HTTP 429/5xxRequest body
modelsfield supports:["gpt-4", "deepseek-chat"][{"provider": "openai", "model": "gpt-4"}]Behavior:
modelsfield is always stripped before forwarding upstreammodelsfield is ignored — fully backward compatibleDocs (
docs/en/latest/plugins/ai-proxy-multi.md):allow_client_model_preferenceto attributes tablemodelsto request format tableTests (
t/plugin/ai-proxy-multi.client-model-preference.t):modelsfieldmodelsfield stripped from forwarded requestallow_client_model_preferenceis falseResolves #13083