-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Feat/issue 5784 custom max tokens #5788
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat/issue 5784 custom max tokens #5788
Conversation
daniel-lxs
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
The issue was that OpenAI-compatible providers (Chutes, Groq) were directly using model.info.maxTokens instead of calling getModelMaxOutputTokens(). This meant that the user's custom modelMaxTokens setting was being ignored. Fixed by: - Updating BaseOpenAiCompatibleProvider to use getModelMaxOutputTokens() - Updating ChutesHandler's getCompletionParams to use getModelMaxOutputTokens() This ensures that when users set a custom max output tokens value in the settings, it will be properly applied to API requests for all OpenAI-compatible providers.
52a8eef to
6a4653a
Compare
The providers I hadn't tested are getting the max output from the model directly, I need to change them all... The commit above fixed it for Chutes, but I still need to change the other occurrences for the other providers. Marked this as draft, will get it done ASAP, but I don't have access to all of the providers, will test the ones I can. |
… all providers - Updated BaseOpenAiCompatibleProvider to use getModelMaxOutputTokens() - Fixed ChutesHandler to respect user's custom max tokens - Fixed LiteLLM createMessage and completePrompt methods - Fixed Glama createMessage and completePrompt methods - Fixed Unbound createMessage and completePrompt methods - Fixed Mistral getModel method to use getModelMaxOutputTokens() - Fixed XAI to use getModelMaxOutputTokens() - Fixed OpenAI addMaxTokensIfNeeded to use getModelMaxOutputTokens() - Fixed Gemini to use maxTokens from getModel() which already applies user settings This ensures that when users set a custom max output tokens value in their provider settings, it will be respected across all providers (capped to the model's actual maximum).
- Fixed test expectation to properly cap user's modelMaxTokens to model's actual capability - Added new test case for when user sets lower max tokens than model supports - Removed debug logging
|
Fixed the failing test by updating the test expectation to match the correct behavior. The test was expecting that a user could set modelMaxTokens higher than what the model actually supports (32000 > 4096). However, the correct behavior is to cap the user's request to the model's actual capability. This ensures we don't request more tokens than the model can provide. The implementation correctly uses Math.min(userSetting, modelMax) to handle this. //// Still have to test the providers. |
…or OpenAI compatible providers - Reverted BaseOpenAiCompatibleProvider to use maxTokens directly from model info - OpenAI compatible providers have their own server-side max output configuration - Hidden the generic MaxTokensSlider for OpenAI compatible provider in the UI - This ensures OpenAI compatible providers use their own max tokens configuration
|
@mrubens Chutes I've tested all of them except for Unbound and LiteLLM, they all seem to be working fine. Previously, they were getting the max tokens directly from the model, now it uses the custom setting. |
|
What happens if user setting of model max is decreased in between a conversation and the current context is more than the user setting? |


PR Title: feat: Add Advanced Setting for Custom Max Tokens per Provider Profile (#5784)
Related GitHub Issue
Closes: #5784
Roo Code Task Context (Optional)
Description
This PR adds a new "Max Output Tokens" field in the Advanced Settings section of the provider configuration UI, allowing users to customize the max tokens per provider profile. Previously, Roo Code had a hard-coded limit of 8192 tokens for all API providers, which prevented users from fully utilizing models that support higher token limits.
Key implementation details:
MaxTokensControlcomponent with numeric input validationApiOptionsgetModelMaxOutputTokens()to respect user-configuredmodelMaxTokensfor all modelsDesign choices:
Test Procedure
Automated Tests:
src/shared/__tests__/api.spec.tswebview-ui/src/components/settings/__tests__/MaxTokensControl.spec.tsxManual Testing Steps:
Pre-Submission Checklist
Screenshots / Videos
[Screenshots should be added showing the new Max Output Tokens field in the Advanced Settings section]
Documentation Updates
Additional Notes
This implementation leverages the existing
modelMaxTokensfield inProviderSettingswhich was previously only used for reasoning models. The field is now extended to work with all models, providing a consistent experience across different provider types.Get in Touch
@MuriloFP
Important
Introduces customizable max output tokens per provider profile in the UI, with backend support and comprehensive testing.
MaxTokensControlcomponent with numeric input validation.ApiOptionsinApiOptions.tsx.getModelMaxOutputTokens()to use user-configuredmodelMaxTokens.chutes.ts,gemini.ts, andglama.ts.api.spec.ts.MaxTokensControl.spec.tsx.This description was created by
for 8ade97f. You can customize this summary. It will automatically update as commits are pushed.