Skip to content

Conversation

@daniel-lxs
Copy link
Member

@daniel-lxs daniel-lxs commented Aug 22, 2025

Description

This PR ports the prompt caching support for Kimi K2 on Groq from the upstream Cline repository.

Ported from: cline/cline#5697

Changes

  • Added interface to handle Groq's cached token fields in the response
  • Implemented proper cost calculation with cache read discounts using the existing function
  • Enabled prompt caching for Kimi K2 model with a 50% discount on cached input tokens
  • Updated tests to verify the caching functionality works correctly

Implementation Details

Groq Handler

  • Added a custom interface that extends OpenAI's CompletionUsage to include
  • Overrode the method to use a custom method
  • The method:
    • Extracts cached token information from Groq's response
    • Calculates costs with proper cache discounts
    • Reports non-cached input tokens separately from cached tokens

Model Configuration

  • Set for the Kimi K2 model
  • Added for 50% discount on cached tokens

Tests

  • Updated existing test to expect the new usage format with cache fields
  • Added new test case for cached token handling

Testing

✅ All tests passing (12/12)
✅ TypeScript compilation successful
✅ ESLint checks pass

Credits

This implementation is based on the original work from the Cline repository PR #5697.


Important

Adds prompt caching support for Kimi K2 on Groq with cost calculation and test updates.

  • Behavior:
    • Enables prompt caching for moonshotai/kimi-k2-instruct model with a 50% discount on cached input tokens in groq.ts.
    • Implements cost calculation with cache read discounts in GroqHandler.
  • Implementation:
    • Adds GroqUsage interface in groq.ts to handle cached token fields.
    • Overrides createMessage() in GroqHandler to yield usage data with cache details.
    • Introduces yieldUsage() in GroqHandler to calculate and yield usage costs.
  • Tests:
    • Updates tests in groq.spec.ts to verify caching functionality and cost calculations.
    • Adds test case for handling cached tokens in usage data.

This description was created by Ellipsis for 8fa6f00. You can customize this summary. It will automatically update as commits are pushed.

Ported from upstream Cline repository PR #5697
Original PR: cline/cline#5697

- Added GroqUsage interface to handle cached token fields
- Implemented proper cost calculation with cache read discounts
- Enabled prompt caching for Kimi K2 model with 50% discount on cached tokens
- Updated tests to verify caching functionality

Co-authored-by: Cline Contributors <[email protected]>
@daniel-lxs daniel-lxs requested review from cte, jr and mrubens as code owners August 22, 2025 16:15
@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. enhancement New feature or request labels Aug 22, 2025
@hannesrudolph hannesrudolph added the Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. label Aug 22, 2025
@daniel-lxs daniel-lxs moved this from Triage to PR [Needs Review] in Roo Code Roadmap Aug 22, 2025
Copy link
Contributor

@roomote roomote bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your contribution! I've reviewed the changes and found some issues that need attention before merging.

// Calculate non-cached input tokens for proper reporting
const nonCachedInputTokens = Math.max(0, inputTokens - cacheReadTokens - cacheWriteTokens)

console.log("usage", {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Debug logging should be removed from production code. Could we remove this console.log statement?

import type { ApiHandlerOptions } from "../../shared/api"
import type { ApiHandlerCreateMessageMetadata } from "../index"
import { ApiStream } from "../transform/stream"
import { convertToOpenAiMessages } from "../transform/openai-format"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this import still needed? It appears to be unused since the createMessage method is overridden and doesn't call convertToOpenAiMessages.

}

if (chunk.usage) {
yield* this.yieldUsage(chunk.usage as GroqUsage)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add type validation here to ensure chunk.usage conforms to GroqUsage structure? The type assertion without validation could potentially cause runtime errors if the API response structure changes.


const cacheReadTokens = usage?.prompt_tokens_details?.cached_tokens || 0

// Groq does not track cache writes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we expand this comment to provide more context? For example: 'Groq does not track cache writes - only cache reads are reported in the API response. This is a limitation of the Groq API as of [date].'

cacheReadTokens: 30,
})
expect(typeof firstChunk.value.totalCost).toBe("number")
})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding edge case tests:

  • When prompt_tokens_details is present but cached_tokens is undefined
  • When cached tokens exceed total prompt tokens (error case)
  • Verify actual cost calculation values instead of just checking the type

@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Aug 22, 2025
@hannesrudolph hannesrudolph added PR - Needs Review and removed Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. labels Aug 22, 2025
@mrubens mrubens merged commit faab314 into main Aug 22, 2025
35 of 36 checks passed
@mrubens mrubens deleted the feat/groq-kimi-k2-prompt-caching branch August 22, 2025 16:40
@github-project-automation github-project-automation bot moved this from New to Done in Roo Code Roadmap Aug 22, 2025
@github-project-automation github-project-automation bot moved this from PR [Needs Review] to Done in Roo Code Roadmap Aug 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request lgtm This PR has been approved by a maintainer PR - Needs Review size:L This PR changes 100-499 lines, ignoring generated files.

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

5 participants