-
Notifications
You must be signed in to change notification settings - Fork 2.6k
fix:Feature/enhanced token counting #3396
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix:Feature/enhanced token counting #3396
Conversation
| } | ||
|
|
||
| // Select appropriate encoder based on provider | ||
| let encoderSource: any |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider defining an explicit type for the encoder source instead of using any for encoderSource. This would improve type safety and clarity.
This comment was generated because it violated a code review rule: mrule_QkEwsCio7v34DaCF.
| /** Number of output/completion tokens */ | ||
| outputTokens: number | ||
| /** Number of tokens read from cache (if applicable) */ | ||
| cachedTokens?: number |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The properties cachedTokens and cacheReadTokens both include the comment 'Number of tokens read from cache (if applicable)'. It might be confusing for future developers to distinguish between them. Please clarify the distinction in the comments or consider renaming one of them if they serve different purposes.
This comment was generated because it violated a code review rule: mrule_aQsEnH8jWdOfHq2Z.
|
🚀 I do think this needs to be marked as a feature rather than a chore because of the templates rules "chore: Other changes that don't modify src or test files" |
…#3396) * Increasing file sizes for files that can be read by cline * Update src/integrations/misc/extract-text.ts Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com> * Increasing file sizes for files that can be read by cline --------- Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>
Related GitHub Issue
Closes: #1908
Description
This PR introduces core enhancements to token counting mechanisms across different providers. The primary goal is to improve accuracy and provide more granular, provider-specific token handling.
Key implementation details:
tiktoken.tsto usecl100k_baseencoder for broader OpenAI model compatibility and introduced provider-specific fudge factors and content-type handling (text, images).tokenDisplay.tswith utilities for user-friendly token information formatting, detailed usage metrics via tooltips, and readable token count conversion.base-provider.tswith a more provider-agnostic token counting interface, a standardlastTokenUsageproperty, and default counting methods overridable by specific providers.openrouter.ts(API-based counting, cached/reasoning token handling, fixedTokenUsageInfoimport) andrequesty.ts(fixed type compatibility, provider-specific API counting, cached/other token type handling).content-format.tsfor content format conversion utilities and specialized content-type handling.tiktoken.test.tsto cover new behaviors, fudge factors, and image token counting.Reviewers should pay attention to the provider-specific implementations in
openrouter.tsandrequesty.tsto ensure the new counting logic aligns with each provider's API and tokenization nuances. Also, the updates totiktoken.test.tsshould be reviewed for comprehensive coverage of the new functionalities.Test Procedure
Testing was performed through a combination of:
tiktoken.test.tswas significantly enhanced to cover the new token counting logic, including:cl100k_baseencoder output.tokenDisplay.ts.lastTokenUsageproperty inbase-provider.tswas correctly updated after each operation.Reviewers can reproduce tests by:
npm testto execute the automated unit tests.Type of Change
srcor test files. (e.g. enhancing test suite is a dev chore)Pre-Submission Checklist
npm run lint).console.log) has been removed.npm test).mainbranch.npm run changesetif this PR includes user-facing changes or dependency updates.Screenshots / Videos
N/A (Changes are primarily backend logic and type definitions related to token counting, with minor utility functions for display that don't constitute a significant UI change needing screenshots.)
Documentation Updates
Additional Notes
The "Chore" type of change includes the significant enhancements to the test suite, which, while involving test files, is a development chore aimed at improving code quality and reliability rather than fixing a bug in the application's
srccode or adding a new feature directly. The refactoring aspect is tied to the improved code organization and making the token counting system more modular and provider-agnostic.Important
Enhances token counting with provider-specific logic, utilities for display, and comprehensive test updates.
tiktoken.tsto usecl100k_baseencoder and added provider-specific fudge factors and content-type handling.tokenDisplay.tsfor token information formatting and usage metrics.base-provider.tsfor provider-agnostic token counting and addedlastTokenUsageproperty.openrouter.tsandrequesty.tsfor API-based counting and token handling.content-format.tsfor content format conversion utilities.tiktoken.test.tsfor new behaviors and image token counting.sliding-window.test.tsfor new token counting logic.Task.test.tsfor image block handling based on model capabilities.This description was created by
for eb62fa1. You can customize this summary. It will automatically update as commits are pushed.