Skip to content

Gemini/Anthropic: Stop using remote token count APIsΒ #3666

@Shakahs

Description

@Shakahs

What problem does this proposed feature solve?

For Gemini/Vertex/Anthropic providers Roo delays every API request so it can first make a separate API request to get the prompt token count. Making 2 requests significantly increases latency, for no benefit as every inference response contains the prompt token count already. All the other providers just use tiktoken library to get a local estimate which is good enough.

Describe the proposed solution in detail

Do not make an API request just to count prompt tokens, use tiktoken like the other providers.

Technical considerations or implementation details (optional)

No response

Describe alternatives considered (if any)

If a more accurate count is needed, use the token count already provided in API responses.

Additional Context & Mockups

No response

Proposal Checklist

  • I have searched existing Issues and Discussions to ensure this proposal is not a duplicate.
  • This proposal is for a specific, actionable change intended for implementation (not a general idea).
  • I understand that this proposal requires review and approval before any development work begins.

Are you interested in implementing this feature if approved?

  • Yes, I would like to contribute to implementing this feature.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Issue - In ProgressSomeone is actively working on this. Should link to a PR soon.enhancementNew feature or requestfeature requestFeature request, not a bug

    Type

    No type

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions