fix:Feature/enhanced token counting #3396

SannidhyaSah · 2025-05-09T16:21:03Z

Related GitHub Issue

Closes: #1908

Description

This PR introduces core enhancements to token counting mechanisms across different providers. The primary goal is to improve accuracy and provide more granular, provider-specific token handling.

Key implementation details:

Core Token Counting Enhancements:
- Updated tiktoken.ts to use cl100k_base encoder for broader OpenAI model compatibility and introduced provider-specific fudge factors and content-type handling (text, images).
- Created tokenDisplay.ts with utilities for user-friendly token information formatting, detailed usage metrics via tooltips, and readable token count conversion.
- Refactored base-provider.ts with a more provider-agnostic token counting interface, a standard lastTokenUsage property, and default counting methods overridable by specific providers.
- Implemented provider-specific logic in openrouter.ts (API-based counting, cached/reasoning token handling, fixed TokenUsageInfo import) and requesty.ts (fixed type compatibility, provider-specific API counting, cached/other token type handling).
- Added content-format.ts for content format conversion utilities and specialized content-type handling.
- Enhanced the test suite in tiktoken.test.ts to cover new behaviors, fudge factors, and image token counting.
Technical Improvements:
- Type Definitions: Ensured consistent typing and a more flexible token usage interface.
- Code Organization: Separated token counting from display logic for better modularity and maintainability.
- Performance Optimizations: Prioritized API-based counting where available and included fallbacks.

Reviewers should pay attention to the provider-specific implementations in openrouter.ts and requesty.ts to ensure the new counting logic aligns with each provider's API and tokenization nuances. Also, the updates to tiktoken.test.ts should be reviewed for comprehensive coverage of the new functionalities.

Test Procedure

Testing was performed through a combination of:

Unit Tests: The existing test suite in tiktoken.test.ts was significantly enhanced to cover the new token counting logic, including:
- Verification of cl100k_base encoder output.
- Tests for provider-specific fudge factors ensuring they are applied correctly.
- Specific tests for image token counting to simulate realistic scenarios.
- Tests for the new token display utilities in tokenDisplay.ts.
Manual Testing:
- Integrated the changes into a local Roo Code instance.
- Manually invoked operations that trigger token counting with various providers (OpenRouter, Requesty, and default OpenAI).
- Compared the displayed token counts with expected values based on provider documentation and manual calculations for different text and image inputs.
- Verified the lastTokenUsage property in base-provider.ts was correctly updated after each operation.
- Checked tooltip displays for detailed token usage metrics.

Reviewers can reproduce tests by:

Running npm test to execute the automated unit tests.
Manually testing with various prompts and models from different providers in a development environment to observe token count accuracy and display.

Type of Change

🐛 Bug Fix: Non-breaking change that fixes an issue. (e.g. unused import, type compatibility)
✨ New Feature: Non-breaking change that adds functionality.
💥 Breaking Change: Fix or feature that would cause existing functionality to not work as expected.
♻️ Refactor: Code change that neither fixes a bug nor adds a feature. (e.g. code organization, provider-agnostic interface)
💅 Style: Changes that do not affect the meaning of the code (white-space, formatting, etc.).
📚 Documentation: Updates to documentation files.
⚙️ Build/CI: Changes to the build process or CI configuration.
🧹 Chore: Other changes that don't modify src or test files. (e.g. enhancing test suite is a dev chore)

Pre-Submission Checklist

Screenshots / Videos

N/A (Changes are primarily backend logic and type definitions related to token counting, with minor utility functions for display that don't constitute a significant UI change needing screenshots.)

Documentation Updates

Additional Notes

The "Chore" type of change includes the significant enhancements to the test suite, which, while involving test files, is a development chore aimed at improving code quality and reliability rather than fixing a bug in the application's src code or adding a new feature directly. The refactoring aspect is tied to the improved code organization and making the token counting system more modular and provider-agnostic.

Important

Enhances token counting with provider-specific logic, utilities for display, and comprehensive test updates.

Behavior:
- Updated tiktoken.ts to use cl100k_base encoder and added provider-specific fudge factors and content-type handling.
- Added tokenDisplay.ts for token information formatting and usage metrics.
- Refactored base-provider.ts for provider-agnostic token counting and added lastTokenUsage property.
- Implemented provider-specific logic in openrouter.ts and requesty.ts for API-based counting and token handling.
- Added content-format.ts for content format conversion utilities.
- Enhanced tests in tiktoken.test.ts for new behaviors and image token counting.
Technical Improvements:
- Consistent typing and flexible token usage interface.
- Separated token counting from display logic for modularity.
- Prioritized API-based counting with fallbacks.
Misc:
- Adjusted sliding-window.test.ts for new token counting logic.
- Updated Task.test.ts for image block handling based on model capabilities.

^{This description was created by}^{for eb62fa1. You can customize this summary. It will automatically update as commits are pushed.}

ellipsis-dev · 2025-05-09T16:24:41Z

src/utils/tiktoken.ts

+	}
+
+	// Select appropriate encoder based on provider
+	let encoderSource: any


Consider defining an explicit type for the encoder source instead of using any for encoderSource. This would improve type safety and clarity.

^{This comment was generated because it violated a code review rule: mrule_QkEwsCio7v34DaCF.}

ellipsis-dev · 2025-05-09T16:24:41Z

src/shared/api.ts

+	/** Number of output/completion tokens */
+	outputTokens: number
+	/** Number of tokens read from cache (if applicable) */
+	cachedTokens?: number


The properties cachedTokens and cacheReadTokens both include the comment 'Number of tokens read from cache (if applicable)'. It might be confusing for future developers to distinguish between them. Please clarify the distinction in the comments or consider renaming one of them if they serve different purposes.

^{This comment was generated because it violated a code review rule: mrule_aQsEnH8jWdOfHq2Z.}

adamhill · 2025-05-09T18:52:50Z

🚀 I do think this needs to be marked as a feature rather than a chore because of the templates rules "chore: Other changes that don't modify src or test files"

…#3396) * Increasing file sizes for files that can be read by cline * Update src/integrations/misc/extract-text.ts Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com> * Increasing file sizes for files that can be read by cline --------- Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>

Sannidhya Sah added 3 commits May 9, 2025 16:23

Implement enhanced token counting with provider-specific optimizations

51b5743

Fix test failures related to token estimation and timeout issues

805605d

Remove unused files and update knip.json

eb62fa1

SannidhyaSah requested review from cte and mrubens as code owners May 9, 2025 16:21

github-project-automation bot added this to Roo Code Roadmap May 9, 2025

github-project-automation bot moved this to New in Roo Code Roadmap May 9, 2025

dosubot bot added size:XL This PR changes 500-999 lines, ignoring generated files. enhancement New feature or request labels May 9, 2025

SannidhyaSah changed the title ~~Feature/enhanced token counting~~ fix:Feature/enhanced token counting May 9, 2025

ellipsis-dev bot reviewed May 9, 2025

View reviewed changes

hannesrudolph moved this from New to PR [Pre Approval Review] in Roo Code Roadmap May 10, 2025

mrubens added this to Roo Code Roadmap May 20, 2025

github-project-automation bot moved this to New in Roo Code Roadmap May 20, 2025

hannesrudolph moved this from New to PR [Pre Approval Review] in Roo Code Roadmap May 20, 2025

SannidhyaSah closed this May 22, 2025

SannidhyaSah deleted the feature/enhanced-token-counting branch May 22, 2025 04:17

github-project-automation bot moved this from PR [Pre Approval Review] to Done in Roo Code Roadmap May 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix:Feature/enhanced token counting #3396

fix:Feature/enhanced token counting #3396

Uh oh!

SannidhyaSah commented May 9, 2025 •

edited

Loading

Uh oh!

ellipsis-dev bot May 9, 2025

Uh oh!

ellipsis-dev bot May 9, 2025

Uh oh!

adamhill commented May 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fix:Feature/enhanced token counting #3396

fix:Feature/enhanced token counting #3396

Uh oh!

Conversation

SannidhyaSah commented May 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Related GitHub Issue

Description

Test Procedure

Type of Change

Pre-Submission Checklist

Screenshots / Videos

Documentation Updates

Additional Notes

Uh oh!

ellipsis-dev bot May 9, 2025

Choose a reason for hiding this comment

Uh oh!

ellipsis-dev bot May 9, 2025

Choose a reason for hiding this comment

Uh oh!

adamhill commented May 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

SannidhyaSah commented May 9, 2025 •

edited

Loading