Skip to content

Conversation

@roomote
Copy link
Contributor

@roomote roomote bot commented Aug 11, 2025

This PR fixes issue #6936 where the Kimi K2 model output was being truncated to 1024 tokens when using the OpenAI Compatible API provider.

Problem

The BaseOpenAiCompatibleProvider class was always including the max_tokens parameter in API requests, regardless of the includeMaxTokens option setting. This caused issues with certain models like Kimi K2 that have different default token limits.

Solution

  • Modified BaseOpenAiCompatibleProvider to only include the max_tokens parameter when includeMaxTokens option is explicitly set to true
  • This aligns the behavior with the OpenAiHandler class which already implements this pattern

Changes

  • Updated src/api/providers/base-openai-compatible-provider.ts to conditionally include max_tokens
  • Updated tests for all providers that extend BaseOpenAiCompatibleProvider:
    • groq.spec.ts
    • fireworks.spec.ts
    • chutes.spec.ts
    • sambanova.spec.ts
    • zai.spec.ts
  • Added new test cases to verify:
    • max_tokens is NOT included by default
    • max_tokens IS included when includeMaxTokens is true
    • Custom modelMaxTokens value is used when provided

Testing

All tests pass successfully:

  • ✅ 78 tests passing across all affected provider test files
  • ✅ Linting checks pass
  • ✅ Type checks pass

Fixes #6936


Important

Fixes max_tokens inclusion in BaseOpenAiCompatibleProvider to respect includeMaxTokens option, with updated tests.

  • Behavior:
    • BaseOpenAiCompatibleProvider now includes max_tokens only if includeMaxTokens is true.
    • Uses modelMaxTokens if provided, otherwise defaults to model's maxTokens.
  • Code Changes:
    • Updated createMessage() in base-openai-compatible-provider.ts to conditionally include max_tokens.
  • Testing:
    • Updated tests in groq.spec.ts, fireworks.spec.ts, chutes.spec.ts, sambanova.spec.ts, and zai.spec.ts.
    • Added test cases to verify max_tokens inclusion based on includeMaxTokens and custom modelMaxTokens.
    • All tests pass successfully, including 78 tests across affected files.

This description was created by Ellipsis for 1f205d4. You can customize this summary. It will automatically update as commits are pushed.

- Modified BaseOpenAiCompatibleProvider to only include max_tokens parameter when includeMaxTokens option is true
- This fixes issue #6936 where Kimi K2 model output was being truncated to 1024 tokens
- Updated tests for all providers that extend BaseOpenAiCompatibleProvider (groq, fireworks, chutes, sambanova, zai)
- Added new test cases to verify max_tokens is not included by default and is included when includeMaxTokens is true

Fixes #6936
@roomote roomote bot requested review from cte, jr and mrubens as code owners August 11, 2025 16:15
@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. bug Something isn't working labels Aug 11, 2025
Copy link
Contributor Author

@roomote roomote bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed my own code. Found bugs I introduced 10 minutes ago. Classic.

// Only add max_tokens if includeMaxTokens is true
if (this.options.includeMaxTokens === true) {
// Use user-configured modelMaxTokens if available, otherwise fall back to model's default maxTokens
params.max_tokens = this.options.modelMaxTokens || info.maxTokens
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I notice that OpenAiHandler uses max_completion_tokens (the modern parameter) instead of the deprecated max_tokens. Should we consider using max_completion_tokens here as well for consistency?

The OpenAI documentation indicates that max_tokens is deprecated in favor of max_completion_tokens. This might cause issues with newer API versions or certain providers that only support the modern parameter.

}

// Only add max_tokens if includeMaxTokens is true
if (this.options.includeMaxTokens === true) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add a comment here explaining the behavior? Something like:

})

it("createMessage should pass correct parameters to Groq client", async () => {
it("createMessage should not include max_tokens by default", async () => {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be helpful to add a test case that explicitly verifies the behavior when includeMaxTokens is set to false (not just undefined)? This would ensure complete coverage of all possible states.

stream_options: { include_usage: true },
}

// Only add max_tokens if includeMaxTokens is true
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider extracting this logic into a private method like addMaxTokensIfNeeded() similar to the OpenAiHandler implementation. This would improve code organization and make it easier to maintain consistency across providers.

@hannesrudolph hannesrudolph added the Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. label Aug 12, 2025
@daniel-lxs
Copy link
Member

Closing #6936 (comment)

@daniel-lxs daniel-lxs closed this Aug 13, 2025
@github-project-automation github-project-automation bot moved this from Triage to Done in Roo Code Roadmap Aug 13, 2025
@github-project-automation github-project-automation bot moved this from New to Done in Roo Code Roadmap Aug 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. size:L This PR changes 100-499 lines, ignoring generated files.

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

Output is truncated when using Kimi K2

4 participants