Skip to content

Conversation

@roomote
Copy link
Contributor

@roomote roomote bot commented Jul 22, 2025

This PR resolves #3666 by making the token counting API requests asynchronous, so they no longer block the main inference request.

Changes

  • Modified countTokens() in GeminiHandler to return tiktoken estimate immediately while API call happens in background
  • Modified countTokens() in AnthropicHandler to return tiktoken estimate immediately while API call happens in background
  • Added countTokensAsync() private method to both providers to handle the actual API call
  • Added tests to verify asynchronous behavior and immediate returns
  • Vertex provider automatically inherits the fix from GeminiHandler

Benefits

  • Main inference requests start immediately without waiting for token counting
  • Preserves accurate token counts from the API (they just arrive asynchronously)
  • Fallback to tiktoken ensures we always have a working estimate
  • No breaking changes to the API

Testing

  • All existing tests pass
  • Added new tests specifically for the asynchronous behavior
  • Verified that token counting returns immediately (< 100ms) even when API call takes longer

Fixes #3666


Important

Make token counting asynchronous in GeminiHandler and AnthropicHandler, improving performance by not blocking main requests.

  • Behavior:
    • countTokens() in GeminiHandler and AnthropicHandler now returns tiktoken estimate immediately, with API call in background.
    • Introduces countTokensAsync() in both handlers for asynchronous API calls.
    • Vertex provider inherits changes from GeminiHandler.
  • Testing:
    • Adds tests in anthropic.spec.ts and gemini.spec.ts to verify asynchronous behavior and immediate returns.
    • Ensures token counting returns immediately (< 100ms) even if API call is delayed.
  • Misc:
    • No breaking changes to the API.
    • Logs errors to console if async API call fails.

This description was created by Ellipsis for 0f08bce. You can customize this summary. It will automatically update as commits are pushed.

- Modified countTokens() in GeminiHandler to return tiktoken estimate immediately
- Modified countTokens() in AnthropicHandler to return tiktoken estimate immediately
- API calls now happen asynchronously in the background without blocking inference
- Added tests to verify asynchronous behavior and immediate returns
- Vertex provider automatically inherits the fix from GeminiHandler

Fixes #3666
@roomote roomote bot requested review from cte, jr and mrubens as code owners July 22, 2025 13:50
@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. bug Something isn't working labels Jul 22, 2025
@hannesrudolph hannesrudolph added the Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. label Jul 22, 2025
@daniel-lxs
Copy link
Member

daniel-lxs commented Jul 22, 2025

Incorrect approach, the issue isn't scoped with enough detail

@daniel-lxs daniel-lxs closed this Jul 22, 2025
@github-project-automation github-project-automation bot moved this from New to Done in Roo Code Roadmap Jul 22, 2025
@github-project-automation github-project-automation bot moved this from Triage to Done in Roo Code Roadmap Jul 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. size:L This PR changes 100-499 lines, ignoring generated files.

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

Gemini/Anthropic: Stop using remote token count APIs

4 participants