Router models: Only fetch models for the active provider #8912

hannesrudolph · 2025-10-29T18:27:59Z

Summary

Reduce unnecessary provider model list fetches by scoping to the active provider during normal chat flows.

What & Why

Problem: Extension was fetching models from all 12 providers on every settings change and chat interaction, causing unnecessary network overhead.

Solution: Scope requestRouterModels to only fetch the active provider's models during normal flows, while keeping an explicit requestRouterModelsAll path for activation and settings panels.

Key Changes

1. Active-Provider Scoping

requestRouterModels (new): Fetches only the active provider during chat/task flows
requestRouterModelsAll (renamed from requestRouterModels): Fetches all providers for settings and activation
Local provider support: ollama/lmstudio/huggingface included in allFetches when active
Activation warming: One-time fetch-all on extension activation to populate UI

File: src/core/webview/webviewMessageHandler.ts

2. Simplified Caching

3-layer cache: memory (5min TTL) → file (persistent) → network (30s timeout)

Files:

3. API Updates

Added requestRouterModelsAll message type in src/shared/WebviewMessage.ts
Made includeCurrentTime/includeCurrentCost optional in src/shared/ExtensionMessage.ts
Added AbortSignal propagation to all fetcher functions

4. Test Updates

Updated src/core/webview/tests/webviewMessageHandler.spec.ts to use requestRouterModelsAll
Fixed test expectations for AbortSignal parameter in fetcher tests

Modified Files

src/api/providers/fetchers/modelCache.ts
src/api/providers/fetchers/modelEndpointCache.ts
src/core/webview/webviewMessageHandler.ts
src/shared/ExtensionMessage.ts
src/shared/WebviewMessage.ts
src/core/webview/tests/webviewMessageHandler.spec.ts
src/api/providers/fetchers/tests/litellm.spec.ts
src/api/providers/fetchers/tests/lmstudio.test.ts
src/api/providers/fetchers/tests/modelCache.spec.ts
src/api/providers/fetchers/tests/vercel-ai-gateway.spec.ts

Behavior Changes

Startup: One-time fetch-all on activation to warm caches
Chat/Tasks: Active-provider-only fetching (1 provider instead of 12)
Settings Panel: Explicit fetch-all via requestRouterModelsAll
Provider Switch: New provider fetched on first use, then cached

Performance Impact

Network requests: ~12x reduction during normal usage (1 provider vs all providers)
Response time: Faster model fetching due to reduced parallelism overhead
Memory: Lower footprint without coalescing maps

Tests & CI

✅ All tests passing
✅ Lint and type checks clean
✅ No breaking changes to existing APIs

Risks & Mitigations

File cache staleness: Mitigated by 5-minute memory cache TTL
UI assumptions on all providers: Mitigated via explicit requestRouterModelsAll in settings
Local provider gaps: Fixed by including ollama/lmstudio/huggingface when active

…pe + debounce Implements Phase 1/2/3 from temp plan: 1) Coalesce in-flight per-provider fetches with timeouts in modelCache and modelEndpointCache; 2) Read file cache on memory miss (Option A) with background refresh; 3) Scope router-models to active provider by default and add requestRouterModelsAll for activation/settings; 4) Debounce requestRouterModels to reduce duplicates. Also removes immediate re-read after write and adds small logging for OpenRouter fetch counts. Test adjustments ensure deterministic behavior in CI by disabling debounce in NODE_ENV=test and fetching all providers in unit test paths. Key changes: - src/api/providers/fetchers/modelCache.ts: add inFlightModelFetches and withTimeout; consult file cache on miss; remove immediate re-read after write; telemetry-style console logs - src/api/providers/fetchers/modelEndpointCache.ts: add inFlightEndpointFetches and withTimeout; consult file cache on miss - src/core/webview/webviewMessageHandler.ts: add requestRouterModelsAll; default requestRouterModels to active provider; debounce; warm caches on activation; NODE_ENV=test disables debounce and runs allFetches so tests remain stable - src/shared/WebviewMessage.ts: add 'requestRouterModelsAll' message type - src/shared/ExtensionMessage.ts: move includeCurrentTime/includeCurrentCost to optional fields - src/api/providers/openrouter.ts: log models/endpoints count after fetch - tests: update webviewMessageHandler.spec to use requestRouterModelsAll where full sweep is expected Working directory summary: M src/api/providers/fetchers/modelCache.ts, M src/api/providers/fetchers/modelEndpointCache.ts, M src/api/providers/openrouter.ts, M src/core/webview/webviewMessageHandler.ts, M src/shared/ExtensionMessage.ts, M src/shared/WebviewMessage.ts, M src/core/webview/__tests__/webviewMessageHandler.spec.ts. Excluded: temp_plan.md (not committed).

roomote · 2025-10-29T18:28:20Z

Code Review Summary

✅ Review Complete - No Issues Found

I've thoroughly reviewed all changes in this PR and found no bugs or issues that need to be addressed. The implementation is well-designed with proper error handling, timeout protection, and test coverage.

Key improvements implemented:

✅ Coalescing logic prevents duplicate concurrent fetches
✅ File cache pre-read with background refresh (Option A)
✅ Active-provider scoping with explicit "fetch all" path
✅ Debouncing for rapid requests (disabled in tests)
✅ Proper cleanup in finally blocks
✅ 30-second timeout protection
✅ Tests updated to use requestRouterModelsAll

Follow Along on Roo Code Cloud

Copilot

Pull Request Overview

This pull request refactors the router model fetching mechanism to improve performance and reduce redundant network requests. The changes introduce three phases: stale-while-revalidate caching, selective provider fetching, and request debouncing.

Key changes:

Adds new requestRouterModelsAll message type to separate full provider fetches from scoped fetches
Implements stale-while-revalidate caching strategy with background refresh for model and endpoint fetches
Adds request debouncing and coalescing to prevent concurrent duplicate fetches
Makes includeCurrentTime and includeCurrentCost optional fields in ExtensionState
Fixes local constant definition for DEFAULT_CHECKPOINT_TIMEOUT_SECONDS

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 11 comments.

Show a summary per file

File	Description
src/shared/WebviewMessage.ts	Adds new `requestRouterModelsAll` message type
src/shared/ExtensionMessage.ts	Makes `includeCurrentTime` and `includeCurrentCost` optional fields
src/core/webview/webviewMessageHandler.ts	Implements debouncing, selective provider fetching, and fixes checkpoint timeout handling
src/core/webview/tests/webviewMessageHandler.spec.ts	Updates tests to use `requestRouterModelsAll`
src/api/providers/openrouter.ts	Adds debug logging for model fetch counts
src/api/providers/fetchers/modelEndpointCache.ts	Implements stale-while-revalidate caching with background refresh and request coalescing
src/api/providers/fetchers/modelCache.ts	Implements stale-while-revalidate caching with background refresh and request coalescing

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/core/webview/webviewMessageHandler.ts

Copilot · 2025-10-29T18:32:02Z

src/core/webview/webviewMessageHandler.ts

+					key: "roo" as RouterName,
 					options: {


Using as any and as RouterName suggests that 'roo' is not properly defined in the RouterName type union. If 'roo' is a valid provider, it should be added to the RouterName type definition rather than using type assertions.

src/core/webview/webviewMessageHandler.ts

src/api/providers/fetchers/modelCache.ts

- Remove inline withTimeout helper in favor of AbortSignal.timeout() - Add optional AbortSignal parameter to all provider model fetchers: - openrouter, requesty, glama, unbound, litellm, ollama, lmstudio - deepinfra, io-intelligence, vercel-ai-gateway, huggingface, roo - Standardize timeout handling across modelCache and modelEndpointCache - Add useRouterModelsAll hook for settings UI to fetch all providers - Update Unbound and ApiOptions to use requestRouterModelsAll This ensures consistent cancellation behavior and prepares for better request lifecycle management across the codebase.

- Remove unnecessary String(provider) conversion - Remove verbose console.log statements for cache operations - Remove action-tracking comments that don't add value - Keep only essential error logging for debugging

roomote · 2025-10-30T14:52:11Z

Code Review Summary

Status: Changes look promising but a few issues should be addressed before merge.

Key findings

Active-provider scoping gap
- When the active provider is 'ollama', 'lmstudio', or 'huggingface', requestRouterModels builds allFetches without these providers and then filters to the active provider, yielding an empty modelFetchPromises set and posting an empty routerModels payload. This breaks chat flows for these providers.
- Suggested fix: include the active provider in allFetches if it is one of these local providers, or fall back to their specific handlers (requestOllamaModels/requestLmStudioModels/requestHuggingFaceModels) when active.
In-flight coalescing key is too coarse
- modelCache coalesces in-flight requests by provider only. Providers whose model listings depend on options (baseUrl/API key/token) can cross-contaminate: two concurrent calls with different options will share the same in-flight promise and write to the same file cache key.
- Suggested fix: derive a composite key that includes provider + normalized baseUrl + an identity hint for auth (e.g., presence/subject hash), and use it for in-flight coalescing and file cache filenames.
Debounce mismatch with PR description
- The earlier patch added debounce for requestRouterModels/requestRouterModelsAll, but the final code no longer includes it. Rapid UI interactions can still fan out multiple fetches. If intentional, update the PR description; otherwise, reintroduce a lightweight debounce (skipped in tests).

Actionable TODOs

Include the active local provider in requestRouterModels or invoke its dedicated handler when active
Use a composite coalescing key and cache filename to prevent cross-config mixing
Reintroduce a lightweight debounce for router model requests or align the PR description with current behavior

Follow Along on Roo Code Cloud

roomote · 2025-10-30T15:04:35Z

src/core/webview/webviewMessageHandler.ts

+			}
+
+			// Build full list then filter to active provider
+			const allFetches: { key: RouterName; options: GetModelsOptions }[] = [


Active-provider scoping gap: when apiConfiguration.apiProvider is 'ollama', 'lmstudio', or 'huggingface', requestRouterModels builds allFetches without those providers and then filters to the active provider. If one of those is active, modelFetchPromises becomes empty and the handler posts an empty routerModels payload, which breaks chat flows for these providers. Consider including the active local provider in allFetches when selected, or triggering their specific handlers (requestOllamaModels/requestLmStudioModels/requestHuggingFaceModels) as a fallback so the UI receives models for the active provider.

roomote · 2025-10-30T15:08:33Z

src/api/providers/fetchers/modelCache.ts

 const memoryCache = new NodeCache({ stdTTL: 5 * 60, checkperiod: 5 * 60 })

+// Coalesce concurrent fetches per provider within this extension host
+const inFlightModelFetches = new Map<RouterName, Promise<ModelRecord>>()


In-flight coalescing key is too coarse. Coalescing by provider only can return incorrect results for providers whose model lists depend on options (baseUrl/apiKey), e.g. 'litellm', 'requesty', 'roo', 'ollama', 'lmstudio', 'deepinfra', 'io-intelligence'. Two concurrent calls with different options will share the same in-flight promise and also write to the same file cache key, causing cross-config mixing. Consider deriving a composite key: provider + normalized baseUrl + an auth/materialized identity hint (e.g., a hash of apiKey presence or token subject), and include this in both the in-flight map key and the file-cache filename.

src/core/webview/webviewMessageHandler.ts

- Update litellm, lmstudio, modelCache, and vercel-ai-gateway tests - Tests now expect optional AbortSignal parameter (undefined when not provided) - All 52 tests in affected files now passing

Address review feedback: 1. Remove in-flight coalescing logic (out of scope for this PR) - Remove inFlightModelFetches map and related logic from modelCache.ts - Remove inFlightEndpointFetches map and related logic from modelEndpointCache.ts - Remove background refresh on file cache hit - Simplify to: memory cache → file cache → network fetch 2. Fix active-provider scoping gap for local providers - Include ollama/lmstudio/huggingface in allFetches when they are the active provider - Prevents empty routerModels response that breaks chat flows for these providers The PR now focuses solely on its primary goal: scope model fetching to the active provider to reduce unnecessary network requests.

Address review feedback by removing out-of-scope optimizations: 1. Remove in-flight coalescing infrastructure - Delete inFlightModelFetches and inFlightEndpointFetches maps - Eliminate promise sharing across concurrent requests 2. Remove background refresh on file cache hit - Simplify to synchronous flow: memory → file → network - No more fire-and-forget background updates 3. Remove cache performance logging - Delete console.log statements for cache_hit, file_hit, bg_refresh - Clean up debugging artifacts from development 4. Fix active-provider scoping gap - Include ollama/lmstudio/huggingface in requestRouterModels when active - Prevents empty response that breaks chat flows for local providers Result: Simpler, more maintainable code focused on core goal of reducing unnecessary network requests by scoping to active provider.

Refactor to improve separation of concerns: - Create src/services/router-models/index.ts to handle provider model fetching - Extract buildProviderFetchList() function for fetch options construction - Extract fetchRouterModels() function for coordinated model fetching - Move 150+ lines of provider-specific logic out of webviewMessageHandler - Add comprehensive tests in router-models-service.spec.ts (11 test cases) Benefits: - Cleaner webviewMessageHandler with less business logic - Reusable service for router model operations - Better testability with isolated unit tests - Clear separation between UI message handling and data fetching Files changed: - New: src/services/router-models/index.ts - New: src/services/router-models/__tests__/router-models-service.spec.ts - Modified: src/core/webview/webviewMessageHandler.ts (simplified)

daniel-lxs · 2025-10-30T19:05:32Z

Superseded by these 2 PRs, they seem tighter in scope and should achieve the primary goal of keeping the frontend state a bit smaller

#8916 - Backend Filtering
Adds provider filtering to the backend router-models handler.
The webview can include a providers list in requestRouterModels.
If no filter is sent, all providers are returned (backward compatible).
Filtering happens in webviewMessageHandler.ts before calling the router-models service.
Tests cover both filtered and unfiltered cases.
Result: Sends less data to the webview and reduces work on the frontend.

#8917 - Frontend Provider Fetch
Limits webview requests to only the needed providers.
Static providers don’t call the router-models API.
Dynamic providers request just their own via useRouterModels(providers).
A small check ensures each response matches its request, avoiding races.
Result: Less network use and smaller frontend state.

hannesrudolph requested a review from mrubens as a code owner October 29, 2025 18:28

Copilot AI review requested due to automatic review settings October 29, 2025 18:28

hannesrudolph requested review from cte and jr as code owners October 29, 2025 18:28

github-project-automation bot added this to Roo Code Roadmap and Roo Code Roadmap Oct 29, 2025

github-project-automation bot moved this to New in Roo Code Roadmap Oct 29, 2025

github-project-automation bot moved this to Triage in Roo Code Roadmap Oct 29, 2025

dosubot bot added size:XL This PR changes 500-999 lines, ignoring generated files. enhancement New feature or request labels Oct 29, 2025

hannesrudolph added the Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. label Oct 29, 2025

Copilot AI reviewed Oct 29, 2025

View reviewed changes

roomote bot approved these changes Oct 29, 2025

View reviewed changes

daniel-lxs marked this pull request as draft October 29, 2025 21:13

daniel-lxs moved this from Triage to PR [Draft / In Progress] in Roo Code Roadmap Oct 29, 2025

hannesrudolph added PR - Draft / In Progress and removed Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. labels Oct 29, 2025

daniel-lxs added 3 commits October 30, 2025 09:39

Clean up action-tracking comments and logs in modelCache

4cdf54a

- Remove unnecessary String(provider) conversion - Remove verbose console.log statements for cache operations - Remove action-tracking comments that don't add value - Keep only essential error logging for debugging

refactor: remove unused import of ProviderName in modelCache

6aff586

roomote bot reviewed Oct 30, 2025

View reviewed changes

src/core/webview/webviewMessageHandler.ts Show resolved Hide resolved

daniel-lxs added 4 commits October 30, 2025 10:15

Fix test expectations for AbortSignal parameter

9d0f01b

- Update litellm, lmstudio, modelCache, and vercel-ai-gateway tests - Tests now expect optional AbortSignal parameter (undefined when not provided) - All 52 tests in affected files now passing

daniel-lxs changed the title ~~Router models: coalesce fetches, file-cache pre-read, active-only scope + debounce~~ Router models: Only fetch models for the active provider Oct 30, 2025

daniel-lxs closed this Oct 30, 2025

github-project-automation bot moved this from PR [Draft / In Progress] to Done in Roo Code Roadmap Oct 30, 2025

github-project-automation bot moved this from New to Done in Roo Code Roadmap Oct 30, 2025

Router models: Only fetch models for the active provider #8912

Router models: Only fetch models for the active provider #8912

Uh oh!

Conversation

hannesrudolph commented Oct 29, 2025 • edited by daniel-lxs Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What & Why

Key Changes

1. Active-Provider Scoping

2. Simplified Caching

3. API Updates

4. Test Updates

Modified Files

Behavior Changes

Performance Impact

Tests & CI

Risks & Mitigations

Uh oh!

roomote bot commented Oct 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review Summary

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

roomote bot commented Oct 30, 2025 • edited by daniel-lxs Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review Summary

Uh oh!

roomote bot Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

roomote bot Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

daniel-lxs commented Oct 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

hannesrudolph commented Oct 29, 2025 •

edited by daniel-lxs

Loading

roomote bot commented Oct 29, 2025 •

edited

Loading

roomote bot commented Oct 30, 2025 •

edited by daniel-lxs

Loading