-
Notifications
You must be signed in to change notification settings - Fork 2.1k
fix: use max_completion_tokens for GPT-5 models in LiteLLM provider #6980
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
|
@@ -107,16 +107,26 @@ export class LiteLLMHandler extends RouterProvider implements SingleCompletionHa | |||||||||
// Required by some providers; others default to max tokens allowed | ||||||||||
let maxTokens: number | undefined = info.maxTokens ?? undefined | ||||||||||
|
||||||||||
// Check if this is a GPT-5 model that requires max_completion_tokens instead of max_tokens | ||||||||||
const isGPT5Model = modelId.toLowerCase().includes("gpt-5") || modelId.toLowerCase().includes("gpt5") | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Consider extracting the GPT-5 model detection logic into a shared helper function. This logic (using modelId.toLowerCase().includes('gpt-5') || modelId.toLowerCase().includes('gpt5')) appears in both createMessage and completePrompt, and centralizing it would improve maintainability. This comment was generated because it violated a code review rule: irule_tTqpIuNs8DV0QFGj. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The model detection logic could be more precise. Currently,
Suggested change
|
||||||||||
|
||||||||||
const requestOptions: OpenAI.Chat.Completions.ChatCompletionCreateParamsStreaming = { | ||||||||||
model: modelId, | ||||||||||
max_tokens: maxTokens, | ||||||||||
messages: [systemMessage, ...enhancedMessages], | ||||||||||
stream: true, | ||||||||||
stream_options: { | ||||||||||
include_usage: true, | ||||||||||
}, | ||||||||||
} | ||||||||||
|
||||||||||
// GPT-5 models require max_completion_tokens instead of the deprecated max_tokens parameter | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Consider adding a comment explaining why GPT-5 models require this special handling, perhaps with a link to relevant Azure/OpenAI documentation. This would help future maintainers (including future me) understand the context behind this workaround. |
||||||||||
if (isGPT5Model && maxTokens) { | ||||||||||
// @ts-ignore - max_completion_tokens is not in the OpenAI types yet but is supported | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is there a way to avoid using interface GPT5RequestOptions extends Omit<OpenAI.Chat.Completions.ChatCompletionCreateParamsStreaming, 'max_tokens'> {
max_completion_tokens?: number
max_tokens?: never
} |
||||||||||
requestOptions.max_completion_tokens = maxTokens | ||||||||||
} else if (maxTokens) { | ||||||||||
requestOptions.max_tokens = maxTokens | ||||||||||
} | ||||||||||
|
||||||||||
if (this.supportsTemperature(modelId)) { | ||||||||||
requestOptions.temperature = this.options.modelTemperature ?? 0 | ||||||||||
} | ||||||||||
|
@@ -179,6 +189,9 @@ export class LiteLLMHandler extends RouterProvider implements SingleCompletionHa | |||||||||
async completePrompt(prompt: string): Promise<string> { | ||||||||||
const { id: modelId, info } = await this.fetchModel() | ||||||||||
|
||||||||||
// Check if this is a GPT-5 model that requires max_completion_tokens instead of max_tokens | ||||||||||
const isGPT5Model = modelId.toLowerCase().includes("gpt-5") || modelId.toLowerCase().includes("gpt5") | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This detection logic is duplicated from line 111. Would it be cleaner to extract this into a helper method to maintain DRY principles? Something like: private isGPT5Model(modelId: string): boolean {
const modelLower = modelId.toLowerCase()
return modelLower.startsWith("gpt-5") || modelLower.startsWith("gpt5") || modelLower === "gpt5"
} |
||||||||||
|
||||||||||
try { | ||||||||||
const requestOptions: OpenAI.Chat.Completions.ChatCompletionCreateParamsNonStreaming = { | ||||||||||
model: modelId, | ||||||||||
|
@@ -189,7 +202,13 @@ export class LiteLLMHandler extends RouterProvider implements SingleCompletionHa | |||||||||
requestOptions.temperature = this.options.modelTemperature ?? 0 | ||||||||||
} | ||||||||||
|
||||||||||
requestOptions.max_tokens = info.maxTokens | ||||||||||
// GPT-5 models require max_completion_tokens instead of the deprecated max_tokens parameter | ||||||||||
if (isGPT5Model && info.maxTokens) { | ||||||||||
// @ts-ignore - max_completion_tokens is not in the OpenAI types yet but is supported | ||||||||||
requestOptions.max_completion_tokens = info.maxTokens | ||||||||||
} else if (info.maxTokens) { | ||||||||||
requestOptions.max_tokens = info.maxTokens | ||||||||||
} | ||||||||||
|
||||||||||
const response = await this.client.chat.completions.create(requestOptions) | ||||||||||
return response.choices[0]?.message.content || "" | ||||||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great test coverage! Consider adding edge cases like mixed case variations ("GpT-5", "gPt5") or models with additional suffixes ("gpt-5-32k", "gpt-5-vision") to ensure the detection works correctly for all possible GPT-5 model names.