-
Notifications
You must be signed in to change notification settings - Fork 2.6k
fix: limit GPT-5 models max output tokens to 10k #6959
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…verflow - Added special handling for GPT-5 models in getModelMaxOutputTokens() - Limits max output to 10k tokens as recommended in cline/cline#5474 (comment) - Prevents context window overflow when input approaches 272k token limit - Added comprehensive tests for GPT-5 token limiting behavior Fixes #6856
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewing my own code is like debugging in a mirror - everything looks backward but the bugs are still mine.
src/shared/api.ts
Outdated
| // Allow user override via settings, but cap at 10k | ||
| const userMaxTokens = settings?.modelMaxTokens | ||
| if (userMaxTokens) { | ||
| return Math.min(userMaxTokens, 10000) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider extracting this magic number to a named constant like GPT5_MAX_OUTPUT_TOKENS = 10000 for better maintainability. Even I'm having to count zeros here.
src/shared/api.ts
Outdated
| } | ||
|
|
||
| // Special handling for GPT-5 models to prevent context window overflow | ||
| // Limit max output to 10k tokens as per https://github.com/cline/cline/issues/5474#issuecomment-3172109387 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment could be more descriptive about WHY we're limiting to 10k. Consider: // Limit max output to 10k tokens to prevent total context window overflow (400k) when input approaches the 272k limit
src/shared/__tests__/api.spec.ts
Outdated
| format: "openai", | ||
| }) | ||
|
|
||
| expect(result).toBe(10_000) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inconsistent number formatting - the implementation uses 10000 but tests mix 10_000 and 10000. Consider standardizing to use underscore separators throughout for better readability.
src/shared/__tests__/api.spec.ts
Outdated
| expect(result).toBe(10_000) | ||
| }) | ||
|
|
||
| test("should respect user override for GPT-5 models but cap at 10k", () => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding a test case for when settings.modelMaxTokens is undefined or 0 to ensure the default 10k is applied correctly in those edge cases.
|
@roomote-agent use switch_tool to switch to pr-fixer and fix this PR |
|
I see your request to switch to pr-fixer mode. Let me analyze and fix this PR for you. |
- Extract GPT5_MAX_OUTPUT_TOKENS as a named constant (10,000) - Improve model detection with more specific regex pattern - Add comprehensive documentation explaining the context window overflow issue - Add tests for date-suffixed models and invalid model name patterns - Update all test assertions to use the named constant
|
✅ PR has been successfully improved! Summary of ChangesI have enhanced the GPT-5 token limit implementation with the following improvements: 🔧 Code Quality Improvements
✅ Test Coverage
📊 CI StatusAll checks are passing:
The PR is now ready for review with improved code quality and maintainability while preserving the original functionality. |
- Removed complex regex pattern for detecting GPT-5 models
- Now using simple startsWith('gpt-5') check as requested
- Updated tests to reflect the new behavior where any model starting with 'gpt-5' is treated as a GPT-5 model
This PR fixes the GPT-5 token limit issue by capping the max output tokens to 10k, preventing context window overflow when input approaches the 272k limit.
Problem
GPT-5 models have a max output of 128k tokens. When the input gets close to the 272k input limit, the model's output can exceed the total 400k context window, causing API errors.
Solution
As suggested in cline/cline#5474 (comment), this PR limits the max output tokens to 10k for all GPT-5 models (gpt-5, gpt-5-mini, gpt-5-nano).
Changes
getModelMaxOutputTokens()to detect GPT-5 models and cap their output at 10k tokensTesting
Fixes #6856
Important
Caps GPT-5 models' max output tokens to 10k in
getModelMaxOutputTokens()to prevent context window overflow, with comprehensive testing added.getModelMaxOutputTokens()to prevent context window overflow.api.spec.tsfor GPT-5 token limiting behavior.This description was created by
for 054ad59. You can customize this summary. It will automatically update as commits are pushed.