fix: OpenAI-compatible providers inconsistently handle max_tokens vs max_completion_tokens

## Problem

OpenAI deprecated `max_tokens` in favour of `max_completion_tokens` for newer models (o-series, gpt-5). Some OpenAI-compatible providers have adopted this change, some support both, and some still only accept `max_tokens`.

PR #865 addressed this for `OpenaiProvider` only, but the same issue affects other providers that inherit from `BaseOpenAIProvider`.

## Current state

`BaseOpenAIProvider._convert_completion_params` passes through whichever field the user provides without remapping. This means:

- Users passing `max_tokens` get 400 errors on OpenAI o-series/gpt-5 models
- PR #865 fixed this for `OpenaiProvider` but not for other providers like Azure OpenAI, which also rejects `max_tokens` for o-series models

## Provider landscape

The OpenAI spec now uses `max_completion_tokens`. Providers that claim OpenAI compatibility are at various stages of adoption:

- **vLLM** deprecated `max_tokens` in favor of `max_completion_tokens` ([vllm-project/vllm#9837](https://github.com/vllm-project/vllm/pull/9837))
- **Azure OpenAI** supports both, but o-series models require `max_completion_tokens` ([docs](https://learn.microsoft.com/en-us/azure/ai-foundry/openai/latest?view=foundry#create-chat-completion))
- **llama.cpp** added `max_completion_tokens` support recently ([ggml-org/llama.cpp#19831](https://github.com/ggml-org/llama.cpp/pull/19831))
- **DeepSeek** does not support `max_completion_tokens` ([docs](https://api-docs.deepseek.com/api/create-chat-completion))
- **Mistral** does not support `max_completion_tokens` ([docs](https://docs.mistral.ai/api?property=operation-chat_completion_v1_chat_completions_post_request_max_tokens#operation-chat_completion_v1_chat_completions_post_request_max_tokens))
- Other providers would need their API docs checked individually

## Proposed approach

The base provider should follow the current OpenAI spec — remap `max_tokens` → `max_completion_tokens` in `BaseOpenAIProvider._convert_completion_params`. Providers that deviate from the spec should override the method to handle their own requirements (e.g. remapping back to `max_tokens`, stripping unsupported fields like `user` or `reasoning_effort`).

This is the approach taken in the Go implementation: [mozilla-ai/any-llm-go#57](https://github.com/mozilla-ai/any-llm-go/pull/57). The base compatible provider sends `max_completion_tokens` (per spec), and non-compatible providers supply a request transform hook that can adjust any fields before the request is sent.

## Related

- #862 — original issue
- #864 — proposed base provider fix (closed)
- #865 — OpenAI-only fix (merged)
- [mozilla-ai/any-llm-go#57](https://github.com/mozilla-ai/any-llm-go/pull/57) — Go implementation with per-provider request transform hook


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: OpenAI-compatible providers inconsistently handle max_tokens vs max_completion_tokens #867

Problem

Current state

Provider landscape

Proposed approach

Related

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

fix: OpenAI-compatible providers inconsistently handle max_tokens vs max_completion_tokens #867

Description

Problem

Current state

Provider landscape

Proposed approach

Related

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions