Clamp GPT-5 max output tokens to 20% of context window #8495
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR standardizes GPT-5 output token limits with all other models:\n\n- Remove GPT-5 exception in getModelMaxOutputTokens() so max output = min(model.maxTokens, ceil(0.2 * contextWindow)).\n- Update tests accordingly: src/shared/tests/api.spec.ts, src/shared/tests/api.spec.ts.\n- Verified locally: all tests passing (298 files, 3906 tests).\n\nExample: 400k context → 80k max output tokens.\n\nRationale: Aligns GPT-5 behavior (including OpenRouter) with other models and avoids oversized completions when providers report very high max_completion_tokens.
Important
Standardizes GPT-5 output token limits by applying a 20% cap, aligning with other models, and updates tests accordingly.
getModelMaxOutputTokens()inapi.tsto apply 20% cap on output tokens, aligning with other models.api.spec.tsto reflect the new 20% cap behavior for GPT-5 models.model.maxTokens, whichever is smaller.This description was created by
for 8acee59. You can customize this summary. It will automatically update as commits are pushed.