-
Notifications
You must be signed in to change notification settings - Fork 2.6k
fix(sliding-window): prevent premature context condensing #6918
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…okens to prevent premature condensing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewing my own code is like debugging in a mirror - everything looks backward but the bugs are still mine.
| * Safety buffer applied when approaching the hard context limit. | ||
| * Keep very small so we don't trigger premature condensing on large windows. | ||
| */ | ||
| const SAFETY_BUFFER_PERCENTAGE = 0.01 // 1% of total context window |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these constants intentional? They're defined but never used in the implementation. Were they meant to be incorporated into the threshold calculation, or are they placeholders for future enhancements?
| // Calculate effective capacity available for conversation history after reserving output tokens | ||
| const allowedTokens = contextWindow * (1 - TOKEN_BUFFER_PERCENTAGE) - reservedTokens | ||
|
|
||
| // Note: allowedTokens already includes a dynamic buffer and reserved output tokens |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment mentions keeping 'a single source of truth' but doesn't clarify what that source is or how it differs from the previous approach. Could we make this more explicit? Perhaps something like: 'allowedTokens is now the single threshold for both percentage and absolute checks'?
| // If no specific threshold is found for the profile, fall back to global setting | ||
|
|
||
| if (autoCondenseContext) { | ||
| // Base percent on effective capacity so visualization and behavior are aligned |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does 'visualization' refer to here? Is this the UI progress bar, debug output, or something else? A bit more specificity would help future maintainers understand the alignment you're achieving.
| // Calculate available tokens for conversation history | ||
| // Truncate if we're within TOKEN_BUFFER_PERCENTAGE of the context window | ||
| // Calculate effective capacity available for conversation history after reserving output tokens | ||
| const allowedTokens = contextWindow * (1 - TOKEN_BUFFER_PERCENTAGE) - reservedTokens |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider extracting this calculation into a named function like calculateEffectiveTokenCapacity() for better readability and testability. This appears to be the core fix, so giving it a name would make the intent clearer:
| const allowedTokens = contextWindow * (1 - TOKEN_BUFFER_PERCENTAGE) - reservedTokens | |
| const allowedTokens = calculateEffectiveTokenCapacity(contextWindow, reservedTokens) |
Align condense/truncation thresholds with reserved output tokens to avoid early condensing on large-context models. Changes:\n\n- Use a single absolute threshold: allowedTokens = contextWindow * (1 - TOKEN_BUFFER_PERCENTAGE) - reservedTokens\n- Keep percent trigger logic but ensure absolute check respects reserved output tokens\n- Fixes cases where a 400k context model condensed around ~188k due to an overly aggressive absolute threshold\n- Sliding-window unit tests pass locally (28/28)
Important
Adjusts
truncateConversationIfNeeded()inindex.tsto prevent premature condensing in large-context models by aligning thresholds with reserved output tokens.truncateConversationIfNeeded()inindex.tsto use a single absolute threshold for allowed tokens, considering reserved output tokens.SAFETY_BUFFER_PERCENTAGEandMIN_SAFETY_BUFFER_TOKENSinindex.tsto prevent premature condensing.This description was created by
for 598920b. You can customize this summary. It will automatically update as commits are pushed.