Skip to content

Conversation

@hannesrudolph
Copy link
Collaborator

@hannesrudolph hannesrudolph commented Jun 10, 2025

Related GitHub Issue

Closes: #4487

Description

This PR fixes the 3+ minute lag issue when using google/gemini-2.5-pro-preview through OpenRouter by removing explicit cache_control flags for this specific model.

Key implementation details:

  • Excluded google/gemini-2.5-pro-preview from the OPEN_ROUTER_PROMPT_CACHING_MODELS set in packages/types/src/providers/openrouter.ts
  • Added clear comment explaining the exclusion with issue reference
  • Updated test logic to handle the intentional exclusion of this specific model
  • OpenRouter still provides automatic implicit ephemeral caching for this model, so caching benefits are preserved
  • The fix specifically targets the lag caused by explicit "cache_control": { "type": "ephemeral" } flags being added to requests

Reviewers should pay attention to:

  • Only the google/gemini-2.5-pro-preview model is affected - all other models continue to work as before
  • The surgical approach preserves explicit caching for all other models that work properly

Test Procedure

Unit Tests:

# Run OpenRouter-specific tests
npx vitest run api/providers/fetchers/__tests__/openrouter.spec.ts

# Run all provider tests
npx vitest run api/providers/__tests__/

Manual Testing:

  1. Use google/gemini-2.5-pro-preview model through OpenRouter
  2. Verify response time is significantly reduced (from 3+ minutes to normal response time)
  3. Verify caching still works (OpenRouter provides automatic implicit caching)
  4. Verify other Google models continue to work with explicit cache control
  5. Verify non-Google models continue to work with explicit cache control

Testing Environment:

  • All tests pass locally
  • Linting passes
  • Type checking passes
  • No breaking changes to existing functionality

Type of Change

  • 🐛 Bug Fix: Non-breaking change that fixes an issue.
  • New Feature: Non-breaking change that adds functionality.
  • 💥 Breaking Change: Fix or feature that would cause existing functionality to not work as expected.
  • ♻️ Refactor: Code change that neither fixes a bug nor adds a feature.
  • 💅 Style: Changes that do not affect the meaning of the code (white-space, formatting, etc.).
  • 📚 Documentation: Updates to documentation files.
  • ⚙️ Build/CI: Changes to the build process or CI configuration.
  • 🧹 Chore: Other changes that don't modify src or test files.

Pre-Submission Checklist

  • Issue Linked: This PR is linked to an approved GitHub Issue (see "Related GitHub Issue" above).
  • Scope: My changes are focused on the linked issue (one major feature/fix per PR).
  • Self-Review: I have performed a thorough self-review of my code.
  • Code Quality:
    • My code adheres to the project's style guidelines.
    • There are no new linting errors or warnings (npm run lint).
    • All debug code (e.g., console.log) has been removed.
  • Testing:
    • New and/or updated tests have been added to cover my changes.
    • All tests pass locally (npm test - relevant provider tests passing).
    • The application builds successfully with my changes.
  • Branch Hygiene: My branch is up-to-date (rebased) with the main branch.
  • Documentation Impact: I have considered if my changes require documentation updates (see "Documentation Updates" section below).
  • Changeset: A changeset has been created using npm run changeset if this PR includes user-facing changes or dependency updates.
  • Contribution Guidelines: I have read and agree to the Contributor Guidelines.

Screenshots / Videos

Not applicable - this is a performance fix with no UI changes.

Documentation Updates

  • No documentation updates are required.
  • Yes, documentation updates are required.

This change is internal to the caching implementation and doesn't affect user-facing behavior beyond improved performance.

Additional Notes

Model affected:

  • google/gemini-2.5-pro-preview - No longer uses explicit cache_control (prevents 3+ minute lag)

Models NOT affected (continue to use explicit caching as before):

  • All other Google models (google/gemini-2.5-flash-preview, google/gemini-2.0-flash-001, etc.)
  • All Anthropic models
  • All other provider models

Impact:

  • ✅ Eliminates 3+ minute lag for google/gemini-2.5-pro-preview
  • ✅ Preserves caching benefits through OpenRouter's automatic system
  • ✅ No breaking changes - other models continue to work as before
  • ✅ Surgical fix - minimal scope, maximum effectiveness

Get in Touch

I'm available through GitHub for any questions about this PR.


Important

Remove explicit cache control for Google models in openrouter.ts to fix lag issue, updating tests accordingly.

  • Behavior:
    • Removed Google models from OPEN_ROUTER_PROMPT_CACHING_MODELS in openrouter.ts to fix lag issue.
    • OpenRouter still provides implicit ephemeral caching for these models.
  • Tests:
    • Updated openrouter.spec.ts to exclude Google models from caching tests.
    • Ensured test logic matches exclusion list for caching models.
  • Impact:
    • Eliminates 3+ minute lag for Google Gemini models.
    • No other models affected; non-Google models continue with explicit cache control.

This description was created by Ellipsis for a904d67. You can customize this summary. It will automatically update as commits are pushed.

…4487)

- Remove all Google models from OPEN_ROUTER_PROMPT_CACHING_MODELS set
- This resolves 3+ minute lag when using google/gemini-2.5-pro-preview
- OpenRouter still provides automatic implicit ephemeral caching for these models
- Updated tests to handle intentional exclusion of Google models from explicit caching

Fixes #4487
@hannesrudolph hannesrudolph requested review from cte, jr and mrubens as code owners June 10, 2025 03:44
@dosubot dosubot bot added size:M This PR changes 30-99 lines, ignoring generated files. bug Something isn't working labels Jun 10, 2025
- Replace hardcoded exclusion list with simple Google model filter
- Keep original validation logic but make it more maintainable
- Still ensures all our caching models are supported by OpenRouter
- Still verifies we exclude all Google models from explicit caching
- Variable was defined but never used
- Keeps the test logic clean and focused
@dosubot dosubot bot added size:S This PR changes 10-29 lines, ignoring generated files. and removed size:M This PR changes 30-99 lines, ignoring generated files. labels Jun 10, 2025
Copy link
Member

@daniel-lxs daniel-lxs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, now we just wait for the tests

@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Jun 10, 2025
@daniel-lxs daniel-lxs moved this from Triage to PR [Needs Review] in Roo Code Roadmap Jun 10, 2025
@mrubens
Copy link
Collaborator

mrubens commented Jun 10, 2025

@cte can you take a look as well? Not really sure what the implications are of removing these from cached models. Do we still need the code here?

if (modelId.startsWith("google")) {
addGeminiCacheBreakpoints(systemPrompt, openAiMessages)

@daniel-lxs
Copy link
Member

daniel-lxs commented Jun 10, 2025

It seems that these models are now under implicit caching, @hannesrudolph confirmed that the caching is still enabled for these.

I'm not sure if there's any benefits to explicit caching at this point.

Edit: here's the documentation from OpenRouter: https://openrouter.ai/docs/features/prompt-caching
The caching headers are not necessary for Gemini 2.5 Pro and 2.5 Flash.

So Gemini 1.5 might still benefit from the headers.

- More surgical approach - only exclude the specific problematic model
- Keep other Google models in caching (they work fine)
- Add comment explaining the exclusion with issue reference
- Update test to only exclude the specific model

This targets just the model causing 3+ minute lag while preserving
caching benefits for other Google models that work properly.
@hannesrudolph
Copy link
Collaborator Author

image

@hannesrudolph
Copy link
Collaborator Author

I have rolled the PR back to just changes related to google/gemini-2.5-pro-preview so we can get it running quickly asap. Quickfix!

@mrubens mrubens merged commit bf35dcd into main Jun 10, 2025
10 checks passed
@mrubens mrubens deleted the openrouter-cache-fix branch June 10, 2025 04:48
@github-project-automation github-project-automation bot moved this from PR [Needs Review] to Done in Roo Code Roadmap Jun 10, 2025
@github-project-automation github-project-automation bot moved this from New to Done in Roo Code Roadmap Jun 10, 2025
chrarnoldus added a commit to Kilo-Org/kilocode that referenced this pull request Jun 10, 2025
hassoncs pushed a commit to Kilo-Org/kilocode that referenced this pull request Jun 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working lgtm This PR has been approved by a maintainer size:S This PR changes 10-29 lines, ignoring generated files.

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

OpenRouter: google/gemini-2.5-pro-preview has 2+ minute delay before response, while gemini-2.5-pro-preview-05-06 responds in 4-5 seconds

4 participants