-
Notifications
You must be signed in to change notification settings - Fork 149
Description
Problem
OpenAI deprecated max_tokens in favour of max_completion_tokens for newer models (o-series, gpt-5). Some OpenAI-compatible providers have adopted this change, some support both, and some still only accept max_tokens.
PR #865 addressed this for OpenaiProvider only, but the same issue affects other providers that inherit from BaseOpenAIProvider.
Current state
BaseOpenAIProvider._convert_completion_params passes through whichever field the user provides without remapping. This means:
- Users passing
max_tokensget 400 errors on OpenAI o-series/gpt-5 models - PR fix(openai): map max_tokens to max_completion_tokens #865 fixed this for
OpenaiProviderbut not for other providers like Azure OpenAI, which also rejectsmax_tokensfor o-series models
Provider landscape
The OpenAI spec now uses max_completion_tokens. Providers that claim OpenAI compatibility are at various stages of adoption:
- vLLM deprecated
max_tokensin favor ofmax_completion_tokens(vllm-project/vllm#9837) - Azure OpenAI supports both, but o-series models require
max_completion_tokens(docs) - llama.cpp added
max_completion_tokenssupport recently (ggml-org/llama.cpp#19831) - DeepSeek does not support
max_completion_tokens(docs) - Mistral does not support
max_completion_tokens(docs) - Other providers would need their API docs checked individually
Proposed approach
The base provider should follow the current OpenAI spec — remap max_tokens → max_completion_tokens in BaseOpenAIProvider._convert_completion_params. Providers that deviate from the spec should override the method to handle their own requirements (e.g. remapping back to max_tokens, stripping unsupported fields like user or reasoning_effort).
This is the approach taken in the Go implementation: mozilla-ai/any-llm-go#57. The base compatible provider sends max_completion_tokens (per spec), and non-compatible providers supply a request transform hook that can adjust any fields before the request is sent.
Related
- OpenAI provider should map max_tokens to max_completion_tokens for newer models #862 — original issue
- fix(openai): remap max_tokens from kwargs and warn on conflict #864 — proposed base provider fix (closed)
- fix(openai): map max_tokens to max_completion_tokens #865 — OpenAI-only fix (merged)
- mozilla-ai/any-llm-go#57 — Go implementation with per-provider request transform hook