Skip to content

Conversation

@mosleyit
Copy link
Contributor

@mosleyit mosleyit commented May 7, 2025

Related GitHub Issue

Closes: #1173

Description

This PR implements token counting for all Anthropic direct API models to prevent context window limit errors. The implementation:

  1. Uses Anthropic's token counting API to accurately count tokens before sending requests
  2. Proactively checks if the token count approaches the context window limit for each model
  3. Implements adaptive truncation based on how far over the limit we are
  4. Adds verification after truncation to ensure we stay under the limit
  5. Uses a safety buffer (1k tokens) to prevent hitting exact limits

Key implementation details:

  • Added a new countMessageTokens method to count tokens for entire message requests
  • Modified the sliding window implementation to handle all Anthropic models
  • Implemented model-specific context window handling
  • Added comprehensive tests for multiple Anthropic models

Test Procedure

  1. Added unit tests in src/api/providers/__tests__/anthropic-token-counting.test.ts that verify:

    • Token counting for content blocks
    • Token counting for complete messages
    • Conversation truncation when token limits are exceeded
    • Tests for multiple Anthropic models (Claude 3.7 Sonnet, Claude 3 Opus, Claude 3 Haiku)
  2. Manual testing steps:

    • Create a conversation with Claude 3.7 Sonnet that approaches the token limit
    • Verify that the conversation is truncated appropriately
    • Check console logs for token count warnings and truncation information
  3. Run tests with: npx jest src/api/providers/__tests__/anthropic-token-counting.test.ts

Type of Change

  • 🐛 Bug Fix: Non-breaking change that fixes an issue.
  • New Feature: Non-breaking change that adds functionality.
  • 💥 Breaking Change: Fix or feature that would cause existing functionality to not work as expected.
  • ♻️ Refactor: Code change that neither fixes a bug nor adds a feature.
  • 💅 Style: Changes that do not affect the meaning of the code (white-space, formatting, etc.).
  • 📚 Documentation: Updates to documentation files.
  • ⚙️ Build/CI: Changes to the build process or CI configuration.
  • 🧹 Chore: Other changes that don't modify src or test files.

Pre-Submission Checklist

  • Issue Linked: This PR is linked to an approved GitHub Issue (see "Related GitHub Issue" above).
  • Scope: My changes are focused on the linked issue (one major feature/fix per PR).
  • Self-Review: I have performed a thorough self-review of my code.
  • Code Quality:
    • My code adheres to the project's style guidelines.
    • There are no new linting errors or warnings (npm run lint).
    • All debug code (e.g., console.log) has been removed.
  • Testing:
    • New and/or updated tests have been added to cover my changes.
    • All tests pass locally (npm test).
    • The application builds successfully with my changes.
  • Branch Hygiene: My branch is up-to-date (rebased) with the main branch.
  • Documentation Impact: I have considered if my changes require documentation updates (see "Documentation Updates" section below).
  • Changeset: A changeset has been created using npm run changeset if this PR includes user-facing changes or dependency updates.
  • Contribution Guidelines: I have read and agree to the Contributor Guidelines.

Screenshots / Videos

image

N/A - This change doesn't affect the UI.

Documentation Updates

  • No documentation updates are required.

Additional Notes

This implementation addresses the issue where Claude 3.7 Sonnet was exceeding its 200k context window limit. The solution now works for all Anthropic models by using their token counting API and implementing adaptive truncation based on each model's specific context window size.


Important

Implements token counting and adaptive truncation for Anthropic models to prevent exceeding context window limits, with comprehensive tests added.

  • Behavior:
    • Implements token counting for Anthropic models using countMessageTokens in anthropic.ts.
    • Adds adaptive truncation logic in createMessage() and completePrompt() to prevent exceeding context window limits.
    • Introduces a safety buffer of 1k tokens below the context window limit.
  • Tests:
    • Adds anthropic-token-counting.test.ts to test token counting and truncation for multiple models.
    • Tests include scenarios for token counting, message truncation, and handling of different models.
  • Constants:
    • Defines CLAUDE_MAX_SAFE_TOKEN_LIMIT in constants.ts and sliding-window/index.ts to avoid circular dependencies.
  • Misc:
    • Updates truncateConversationIfNeeded() in sliding-window/index.ts to handle Anthropic models specifically.

This description was created by Ellipsis for c9a8c27. You can customize this summary. It will automatically update as commits are pushed.

@changeset-bot
Copy link

changeset-bot bot commented May 7, 2025

⚠️ No Changeset found

Latest commit: c9a8c27

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

* @param model The model ID
* @returns A promise resolving to the token count
*/
async countMessageTokens(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we already have this implemented above in the countTokens method

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review! I see the confusion, but there's an important distinction between the existing countTokens method and my implementation:

  1. The existing countTokens method only counts tokens for individual content blocks, not the entire message request. It takes an array of ContentBlockParam as input and wraps it in a single user message for counting.

  2. My new countMessageTokens method counts tokens for the complete message request including the system prompt and all conversation messages. It takes the system prompt and an array of messages as input, providing a more accurate token count for the entire request.

This distinction is crucial because:

  • The context window limit applies to the entire request, not just individual content blocks
  • The issue in Claude Sonnet 3.7 exceeds 200k context window #1173 occurs when the complete message (system prompt + all messages) exceeds the 200k token limit
  • My implementation adds proactive token counting and adaptive truncation before sending the request

While both methods use the Anthropic API, my implementation provides a more comprehensive solution that specifically addresses the issue where Claude 3.7 Sonnet was exceeding its context window limit.

The existing countTokens method is still used as a fallback in my implementation if the API call fails, ensuring robustness.

@mosleyit mosleyit marked this pull request as ready for review May 7, 2025 14:47
@mosleyit mosleyit requested a review from cte as a code owner May 7, 2025 14:47
@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. bug Something isn't working labels May 7, 2025
}
})

describe("AnthropicHandler Token Counting", () => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding tests that simulate failures in the token counting API (e.g. when countTokens rejects) to verify that the fallback logic in countMessageTokens is correctly used.

This comment was generated because it violated the following rules: mrule_oAUXVfj5l9XxF01R and mrule_OR1S8PRRHcvbdFib.

const safeTokenLimit = Math.min(contextWindow - 1000, CLAUDE_MAX_SAFE_TOKEN_LIMIT)

// If token count exceeds the safe limit, truncate the conversation
if (tokenCount > safeTokenLimit) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using a structured logging mechanism (with proper log levels) rather than using console.log and console.warn directly, to improve production log clarity.

This comment was generated because it violated a code review rule: mrule_OR1S8PRRHcvbdFib.

@hannesrudolph hannesrudolph moved this from New to PR [Pre Approval Review] in Roo Code Roadmap May 7, 2025
bgilbert6 pushed a commit to bgilbert6/Roo-Code that referenced this pull request May 14, 2025
* base

* changeset

* Update src/core/controller/index.ts

Co-authored-by: Ara <[email protected]>

---------

Co-authored-by: Ara <[email protected]>
@hannesrudolph hannesrudolph moved this from New to PR [Pre Approval Review] in Roo Code Roadmap May 20, 2025
@hannesrudolph hannesrudolph moved this from PR [Needs Review] to TEMP in Roo Code Roadmap May 26, 2025
@daniel-lxs daniel-lxs moved this from TEMP to PR [Needs Review] in Roo Code Roadmap May 27, 2025
Copy link
Member

@daniel-lxs daniel-lxs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @mosleyit, Thank you for the contribution. Sorry for taking so long to review your PR.

I Just had a couple of questions about some specific values in your implementation, nothing pops up for me as wrong.

Let me know if you want to discuss this further.

/**
* Maximum safe token limit for Claude 3.7 Sonnet (200k - 1k safety buffer)
* This is imported from constants.ts but redefined here to avoid circular dependencies
*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see CLAUDE_MAX_SAFE_TOKEN_LIMIT is duplicated here and in constants.ts. What's the circular dependency that prevents importing from constants.ts? Any alternatives to avoid the duplication?


// Determine truncation fraction based on excess tokens
// Start with 0.5 (50%) and increase if needed
let truncationFraction = 0.5
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, how did you arrive at these specific values? Would it make sense to extract these as named constants with comments explaining the rationale?

return response.input_tokens
} catch (error) {
// Log error but fallback to estimating tokens by counting each part separately
console.warn("Anthropic message token counting failed, using fallback", error)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the Anthropic token counting API fails the fallback adds a fixed overhead of 5 tokens per message, is this estimate based on any specific data?

@daniel-lxs daniel-lxs moved this from PR [Needs Prelim Review] to PR [Changes Requested] in Roo Code Roadmap May 29, 2025
@daniel-lxs
Copy link
Member

I'll be closing this PR as stale to cleanup our backlog.

If someone else wants to work on the linked issue please leave a comment on #1173 to have it assigned to you.

@daniel-lxs daniel-lxs closed this Jun 9, 2025
@github-project-automation github-project-automation bot moved this from PR [Pre Approval Review] to Done in Roo Code Roadmap Jun 9, 2025
@github-project-automation github-project-automation bot moved this from PR [Changes Requested] to Done in Roo Code Roadmap Jun 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working PR - Changes Requested size:L This PR changes 100-499 lines, ignoring generated files.

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

Claude Sonnet 3.7 exceeds 200k context window

5 participants