Skip to content

Conversation

@hannesrudolph
Copy link
Collaborator

@hannesrudolph hannesrudolph commented Oct 29, 2025

Summary

Reduce unnecessary provider model list fetches by scoping to the active provider during normal chat flows.

What & Why

Problem: Extension was fetching models from all 12 providers on every settings change and chat interaction, causing unnecessary network overhead.

Solution: Scope requestRouterModels to only fetch the active provider's models during normal flows, while keeping an explicit requestRouterModelsAll path for activation and settings panels.

Key Changes

1. Active-Provider Scoping

  • requestRouterModels (new): Fetches only the active provider during chat/task flows
  • requestRouterModelsAll (renamed from requestRouterModels): Fetches all providers for settings and activation
  • Local provider support: ollama/lmstudio/huggingface included in allFetches when active
  • Activation warming: One-time fetch-all on extension activation to populate UI

File: src/core/webview/webviewMessageHandler.ts

2. Simplified Caching

  • 3-layer cache: memory (5min TTL) → file (persistent) → network (30s timeout)

Files:

3. API Updates

4. Test Updates

Modified Files

  • src/api/providers/fetchers/modelCache.ts
  • src/api/providers/fetchers/modelEndpointCache.ts
  • src/core/webview/webviewMessageHandler.ts
  • src/shared/ExtensionMessage.ts
  • src/shared/WebviewMessage.ts
  • src/core/webview/tests/webviewMessageHandler.spec.ts
  • src/api/providers/fetchers/tests/litellm.spec.ts
  • src/api/providers/fetchers/tests/lmstudio.test.ts
  • src/api/providers/fetchers/tests/modelCache.spec.ts
  • src/api/providers/fetchers/tests/vercel-ai-gateway.spec.ts

Behavior Changes

  • Startup: One-time fetch-all on activation to warm caches
  • Chat/Tasks: Active-provider-only fetching (1 provider instead of 12)
  • Settings Panel: Explicit fetch-all via requestRouterModelsAll
  • Provider Switch: New provider fetched on first use, then cached

Performance Impact

  • Network requests: ~12x reduction during normal usage (1 provider vs all providers)
  • Response time: Faster model fetching due to reduced parallelism overhead
  • Memory: Lower footprint without coalescing maps

Tests & CI

✅ All tests passing
✅ Lint and type checks clean
✅ No breaking changes to existing APIs

Risks & Mitigations

  • File cache staleness: Mitigated by 5-minute memory cache TTL
  • UI assumptions on all providers: Mitigated via explicit requestRouterModelsAll in settings
  • Local provider gaps: Fixed by including ollama/lmstudio/huggingface when active

…pe + debounce

Implements Phase 1/2/3 from temp plan: 1) Coalesce in-flight per-provider fetches with timeouts in modelCache and modelEndpointCache; 2) Read file cache on memory miss (Option A) with background refresh; 3) Scope router-models to active provider by default and add requestRouterModelsAll for activation/settings; 4) Debounce requestRouterModels to reduce duplicates. Also removes immediate re-read after write and adds small logging for OpenRouter fetch counts. Test adjustments ensure deterministic behavior in CI by disabling debounce in NODE_ENV=test and fetching all providers in unit test paths.

Key changes: - src/api/providers/fetchers/modelCache.ts: add inFlightModelFetches and withTimeout; consult file cache on miss; remove immediate re-read after write; telemetry-style console logs - src/api/providers/fetchers/modelEndpointCache.ts: add inFlightEndpointFetches and withTimeout; consult file cache on miss - src/core/webview/webviewMessageHandler.ts: add requestRouterModelsAll; default requestRouterModels to active provider; debounce; warm caches on activation; NODE_ENV=test disables debounce and runs allFetches so tests remain stable - src/shared/WebviewMessage.ts: add 'requestRouterModelsAll' message type - src/shared/ExtensionMessage.ts: move includeCurrentTime/includeCurrentCost to optional fields - src/api/providers/openrouter.ts: log models/endpoints count after fetch - tests: update webviewMessageHandler.spec to use requestRouterModelsAll where full sweep is expected

Working directory summary: M src/api/providers/fetchers/modelCache.ts, M src/api/providers/fetchers/modelEndpointCache.ts, M src/api/providers/openrouter.ts, M src/core/webview/webviewMessageHandler.ts, M src/shared/ExtensionMessage.ts, M src/shared/WebviewMessage.ts, M src/core/webview/__tests__/webviewMessageHandler.spec.ts. Excluded: temp_plan.md (not committed).
Copilot AI review requested due to automatic review settings October 29, 2025 18:28
@hannesrudolph hannesrudolph requested review from cte and jr as code owners October 29, 2025 18:28
@dosubot dosubot bot added size:XL This PR changes 500-999 lines, ignoring generated files. enhancement New feature or request labels Oct 29, 2025
@roomote
Copy link

roomote bot commented Oct 29, 2025

Code Review Summary

Review Complete - No Issues Found

I've thoroughly reviewed all changes in this PR and found no bugs or issues that need to be addressed. The implementation is well-designed with proper error handling, timeout protection, and test coverage.

Key improvements implemented:

  • ✅ Coalescing logic prevents duplicate concurrent fetches
  • ✅ File cache pre-read with background refresh (Option A)
  • ✅ Active-provider scoping with explicit "fetch all" path
  • ✅ Debouncing for rapid requests (disabled in tests)
  • ✅ Proper cleanup in finally blocks
  • ✅ 30-second timeout protection
  • ✅ Tests updated to use requestRouterModelsAll

Follow Along on Roo Code Cloud

@hannesrudolph hannesrudolph added the Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. label Oct 29, 2025
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This pull request refactors the router model fetching mechanism to improve performance and reduce redundant network requests. The changes introduce three phases: stale-while-revalidate caching, selective provider fetching, and request debouncing.

Key changes:

  • Adds new requestRouterModelsAll message type to separate full provider fetches from scoped fetches
  • Implements stale-while-revalidate caching strategy with background refresh for model and endpoint fetches
  • Adds request debouncing and coalescing to prevent concurrent duplicate fetches
  • Makes includeCurrentTime and includeCurrentCost optional fields in ExtensionState
  • Fixes local constant definition for DEFAULT_CHECKPOINT_TIMEOUT_SECONDS

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 11 comments.

Show a summary per file
File Description
src/shared/WebviewMessage.ts Adds new requestRouterModelsAll message type
src/shared/ExtensionMessage.ts Makes includeCurrentTime and includeCurrentCost optional fields
src/core/webview/webviewMessageHandler.ts Implements debouncing, selective provider fetching, and fixes checkpoint timeout handling
src/core/webview/tests/webviewMessageHandler.spec.ts Updates tests to use requestRouterModelsAll
src/api/providers/openrouter.ts Adds debug logging for model fetch counts
src/api/providers/fetchers/modelEndpointCache.ts Implements stale-while-revalidate caching with background refresh and request coalescing
src/api/providers/fetchers/modelCache.ts Implements stale-while-revalidate caching with background refresh and request coalescing

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 965 to 966
key: "roo" as RouterName,
options: {
Copy link

Copilot AI Oct 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using as any and as RouterName suggests that 'roo' is not properly defined in the RouterName type union. If 'roo' is a valid provider, it should be added to the RouterName type definition rather than using type assertions.

Copilot uses AI. Check for mistakes.
@daniel-lxs daniel-lxs marked this pull request as draft October 29, 2025 21:13
@daniel-lxs daniel-lxs moved this from Triage to PR [Draft / In Progress] in Roo Code Roadmap Oct 29, 2025
@hannesrudolph hannesrudolph added PR - Draft / In Progress and removed Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. labels Oct 29, 2025
- Remove inline withTimeout helper in favor of AbortSignal.timeout()
- Add optional AbortSignal parameter to all provider model fetchers:
  - openrouter, requesty, glama, unbound, litellm, ollama, lmstudio
  - deepinfra, io-intelligence, vercel-ai-gateway, huggingface, roo
- Standardize timeout handling across modelCache and modelEndpointCache
- Add useRouterModelsAll hook for settings UI to fetch all providers
- Update Unbound and ApiOptions to use requestRouterModelsAll

This ensures consistent cancellation behavior and prepares for better
request lifecycle management across the codebase.
- Remove unnecessary String(provider) conversion
- Remove verbose console.log statements for cache operations
- Remove action-tracking comments that don't add value
- Keep only essential error logging for debugging
@roomote
Copy link

roomote bot commented Oct 30, 2025

Code Review Summary

Status: Changes look promising but a few issues should be addressed before merge.

Key findings

  • Active-provider scoping gap

    • When the active provider is 'ollama', 'lmstudio', or 'huggingface', requestRouterModels builds allFetches without these providers and then filters to the active provider, yielding an empty modelFetchPromises set and posting an empty routerModels payload. This breaks chat flows for these providers.
    • Suggested fix: include the active provider in allFetches if it is one of these local providers, or fall back to their specific handlers (requestOllamaModels/requestLmStudioModels/requestHuggingFaceModels) when active.
  • In-flight coalescing key is too coarse

    • modelCache coalesces in-flight requests by provider only. Providers whose model listings depend on options (baseUrl/API key/token) can cross-contaminate: two concurrent calls with different options will share the same in-flight promise and write to the same file cache key.
    • Suggested fix: derive a composite key that includes provider + normalized baseUrl + an identity hint for auth (e.g., presence/subject hash), and use it for in-flight coalescing and file cache filenames.
  • Debounce mismatch with PR description

    • The earlier patch added debounce for requestRouterModels/requestRouterModelsAll, but the final code no longer includes it. Rapid UI interactions can still fan out multiple fetches. If intentional, update the PR description; otherwise, reintroduce a lightweight debounce (skipped in tests).

Actionable TODOs

  • Include the active local provider in requestRouterModels or invoke its dedicated handler when active
  • Use a composite coalescing key and cache filename to prevent cross-config mixing
  • Reintroduce a lightweight debounce for router model requests or align the PR description with current behavior

Follow Along on Roo Code Cloud

}

// Build full list then filter to active provider
const allFetches: { key: RouterName; options: GetModelsOptions }[] = [
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Active-provider scoping gap: when apiConfiguration.apiProvider is 'ollama', 'lmstudio', or 'huggingface', requestRouterModels builds allFetches without those providers and then filters to the active provider. If one of those is active, modelFetchPromises becomes empty and the handler posts an empty routerModels payload, which breaks chat flows for these providers. Consider including the active local provider in allFetches when selected, or triggering their specific handlers (requestOllamaModels/requestLmStudioModels/requestHuggingFaceModels) as a fallback so the UI receives models for the active provider.

const memoryCache = new NodeCache({ stdTTL: 5 * 60, checkperiod: 5 * 60 })

// Coalesce concurrent fetches per provider within this extension host
const inFlightModelFetches = new Map<RouterName, Promise<ModelRecord>>()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In-flight coalescing key is too coarse. Coalescing by provider only can return incorrect results for providers whose model lists depend on options (baseUrl/apiKey), e.g. 'litellm', 'requesty', 'roo', 'ollama', 'lmstudio', 'deepinfra', 'io-intelligence'. Two concurrent calls with different options will share the same in-flight promise and also write to the same file cache key, causing cross-config mixing. Consider deriving a composite key: provider + normalized baseUrl + an auth/materialized identity hint (e.g., a hash of apiKey presence or token subject), and include this in both the in-flight map key and the file-cache filename.

- Update litellm, lmstudio, modelCache, and vercel-ai-gateway tests
- Tests now expect optional AbortSignal parameter (undefined when not provided)
- All 52 tests in affected files now passing
Address review feedback:

1. Remove in-flight coalescing logic (out of scope for this PR)
   - Remove inFlightModelFetches map and related logic from modelCache.ts
   - Remove inFlightEndpointFetches map and related logic from modelEndpointCache.ts
   - Remove background refresh on file cache hit
   - Simplify to: memory cache → file cache → network fetch

2. Fix active-provider scoping gap for local providers
   - Include ollama/lmstudio/huggingface in allFetches when they are the active provider
   - Prevents empty routerModels response that breaks chat flows for these providers

The PR now focuses solely on its primary goal: scope model fetching to
the active provider to reduce unnecessary network requests.
Address review feedback by removing out-of-scope optimizations:

1. Remove in-flight coalescing infrastructure
   - Delete inFlightModelFetches and inFlightEndpointFetches maps
   - Eliminate promise sharing across concurrent requests

2. Remove background refresh on file cache hit
   - Simplify to synchronous flow: memory → file → network
   - No more fire-and-forget background updates

3. Remove cache performance logging
   - Delete console.log statements for cache_hit, file_hit, bg_refresh
   - Clean up debugging artifacts from development

4. Fix active-provider scoping gap
   - Include ollama/lmstudio/huggingface in requestRouterModels when active
   - Prevents empty response that breaks chat flows for local providers

Result: Simpler, more maintainable code focused on core goal of
reducing unnecessary network requests by scoping to active provider.
Refactor to improve separation of concerns:

- Create src/services/router-models/index.ts to handle provider model fetching
- Extract buildProviderFetchList() function for fetch options construction
- Extract fetchRouterModels() function for coordinated model fetching
- Move 150+ lines of provider-specific logic out of webviewMessageHandler
- Add comprehensive tests in router-models-service.spec.ts (11 test cases)

Benefits:
- Cleaner webviewMessageHandler with less business logic
- Reusable service for router model operations
- Better testability with isolated unit tests
- Clear separation between UI message handling and data fetching

Files changed:
- New: src/services/router-models/index.ts
- New: src/services/router-models/__tests__/router-models-service.spec.ts
- Modified: src/core/webview/webviewMessageHandler.ts (simplified)
@daniel-lxs daniel-lxs changed the title Router models: coalesce fetches, file-cache pre-read, active-only scope + debounce Router models: Only fetch models for the active provider Oct 30, 2025
@daniel-lxs
Copy link
Member

Superseded by these 2 PRs, they seem tighter in scope and should achieve the primary goal of keeping the frontend state a bit smaller

#8916 - Backend Filtering
Adds provider filtering to the backend router-models handler.
The webview can include a providers list in requestRouterModels.
If no filter is sent, all providers are returned (backward compatible).
Filtering happens in webviewMessageHandler.ts before calling the router-models service.
Tests cover both filtered and unfiltered cases.
Result: Sends less data to the webview and reduces work on the frontend.

#8917 - Frontend Provider Fetch
Limits webview requests to only the needed providers.
Static providers don’t call the router-models API.
Dynamic providers request just their own via useRouterModels(providers).
A small check ensures each response matches its request, avoiding races.
Result: Less network use and smaller frontend state.

@daniel-lxs daniel-lxs closed this Oct 30, 2025
@github-project-automation github-project-automation bot moved this from PR [Draft / In Progress] to Done in Roo Code Roadmap Oct 30, 2025
@github-project-automation github-project-automation bot moved this from New to Done in Roo Code Roadmap Oct 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request PR - Draft / In Progress size:XL This PR changes 500-999 lines, ignoring generated files.

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants