-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Claude Sonnet 3.7 exceeds 200k context window #1173 #3268
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
| * @param model The model ID | ||
| * @returns A promise resolving to the token count | ||
| */ | ||
| async countMessageTokens( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we already have this implemented above in the countTokens method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review! I see the confusion, but there's an important distinction between the existing countTokens method and my implementation:
-
The existing
countTokensmethod only counts tokens for individual content blocks, not the entire message request. It takes an array ofContentBlockParamas input and wraps it in a single user message for counting. -
My new
countMessageTokensmethod counts tokens for the complete message request including the system prompt and all conversation messages. It takes the system prompt and an array of messages as input, providing a more accurate token count for the entire request.
This distinction is crucial because:
- The context window limit applies to the entire request, not just individual content blocks
- The issue in Claude Sonnet 3.7 exceeds 200k context window #1173 occurs when the complete message (system prompt + all messages) exceeds the 200k token limit
- My implementation adds proactive token counting and adaptive truncation before sending the request
While both methods use the Anthropic API, my implementation provides a more comprehensive solution that specifically addresses the issue where Claude 3.7 Sonnet was exceeding its context window limit.
The existing countTokens method is still used as a fallback in my implementation if the API call fails, ensuring robustness.
| } | ||
| }) | ||
|
|
||
| describe("AnthropicHandler Token Counting", () => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding tests that simulate failures in the token counting API (e.g. when countTokens rejects) to verify that the fallback logic in countMessageTokens is correctly used.
This comment was generated because it violated the following rules: mrule_oAUXVfj5l9XxF01R and mrule_OR1S8PRRHcvbdFib.
| const safeTokenLimit = Math.min(contextWindow - 1000, CLAUDE_MAX_SAFE_TOKEN_LIMIT) | ||
|
|
||
| // If token count exceeds the safe limit, truncate the conversation | ||
| if (tokenCount > safeTokenLimit) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider using a structured logging mechanism (with proper log levels) rather than using console.log and console.warn directly, to improve production log clarity.
This comment was generated because it violated a code review rule: mrule_OR1S8PRRHcvbdFib.
* base * changeset * Update src/core/controller/index.ts Co-authored-by: Ara <[email protected]> --------- Co-authored-by: Ara <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @mosleyit, Thank you for the contribution. Sorry for taking so long to review your PR.
I Just had a couple of questions about some specific values in your implementation, nothing pops up for me as wrong.
Let me know if you want to discuss this further.
| /** | ||
| * Maximum safe token limit for Claude 3.7 Sonnet (200k - 1k safety buffer) | ||
| * This is imported from constants.ts but redefined here to avoid circular dependencies | ||
| */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see CLAUDE_MAX_SAFE_TOKEN_LIMIT is duplicated here and in constants.ts. What's the circular dependency that prevents importing from constants.ts? Any alternatives to avoid the duplication?
|
|
||
| // Determine truncation fraction based on excess tokens | ||
| // Start with 0.5 (50%) and increase if needed | ||
| let truncationFraction = 0.5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious, how did you arrive at these specific values? Would it make sense to extract these as named constants with comments explaining the rationale?
| return response.input_tokens | ||
| } catch (error) { | ||
| // Log error but fallback to estimating tokens by counting each part separately | ||
| console.warn("Anthropic message token counting failed, using fallback", error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the Anthropic token counting API fails the fallback adds a fixed overhead of 5 tokens per message, is this estimate based on any specific data?
|
I'll be closing this PR as stale to cleanup our backlog. If someone else wants to work on the linked issue please leave a comment on #1173 to have it assigned to you. |
Related GitHub Issue
Closes: #1173
Description
This PR implements token counting for all Anthropic direct API models to prevent context window limit errors. The implementation:
Key implementation details:
countMessageTokensmethod to count tokens for entire message requestsTest Procedure
Added unit tests in
src/api/providers/__tests__/anthropic-token-counting.test.tsthat verify:Manual testing steps:
Run tests with:
npx jest src/api/providers/__tests__/anthropic-token-counting.test.tsType of Change
srcor test files.Pre-Submission Checklist
npm run lint).console.log) has been removed.npm test).mainbranch.npm run changesetif this PR includes user-facing changes or dependency updates.Screenshots / Videos
N/A - This change doesn't affect the UI.
Documentation Updates
Additional Notes
This implementation addresses the issue where Claude 3.7 Sonnet was exceeding its 200k context window limit. The solution now works for all Anthropic models by using their token counting API and implementing adaptive truncation based on each model's specific context window size.
Important
Implements token counting and adaptive truncation for Anthropic models to prevent exceeding context window limits, with comprehensive tests added.
countMessageTokensinanthropic.ts.createMessage()andcompletePrompt()to prevent exceeding context window limits.anthropic-token-counting.test.tsto test token counting and truncation for multiple models.CLAUDE_MAX_SAFE_TOKEN_LIMITinconstants.tsandsliding-window/index.tsto avoid circular dependencies.truncateConversationIfNeeded()insliding-window/index.tsto handle Anthropic models specifically.This description was created by
for c9a8c27. You can customize this summary. It will automatically update as commits are pushed.