-
Notifications
You must be signed in to change notification settings - Fork 2.4k
Router models: Only fetch models for the active provider #8912
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…pe + debounce Implements Phase 1/2/3 from temp plan: 1) Coalesce in-flight per-provider fetches with timeouts in modelCache and modelEndpointCache; 2) Read file cache on memory miss (Option A) with background refresh; 3) Scope router-models to active provider by default and add requestRouterModelsAll for activation/settings; 4) Debounce requestRouterModels to reduce duplicates. Also removes immediate re-read after write and adds small logging for OpenRouter fetch counts. Test adjustments ensure deterministic behavior in CI by disabling debounce in NODE_ENV=test and fetching all providers in unit test paths. Key changes: - src/api/providers/fetchers/modelCache.ts: add inFlightModelFetches and withTimeout; consult file cache on miss; remove immediate re-read after write; telemetry-style console logs - src/api/providers/fetchers/modelEndpointCache.ts: add inFlightEndpointFetches and withTimeout; consult file cache on miss - src/core/webview/webviewMessageHandler.ts: add requestRouterModelsAll; default requestRouterModels to active provider; debounce; warm caches on activation; NODE_ENV=test disables debounce and runs allFetches so tests remain stable - src/shared/WebviewMessage.ts: add 'requestRouterModelsAll' message type - src/shared/ExtensionMessage.ts: move includeCurrentTime/includeCurrentCost to optional fields - src/api/providers/openrouter.ts: log models/endpoints count after fetch - tests: update webviewMessageHandler.spec to use requestRouterModelsAll where full sweep is expected Working directory summary: M src/api/providers/fetchers/modelCache.ts, M src/api/providers/fetchers/modelEndpointCache.ts, M src/api/providers/openrouter.ts, M src/core/webview/webviewMessageHandler.ts, M src/shared/ExtensionMessage.ts, M src/shared/WebviewMessage.ts, M src/core/webview/__tests__/webviewMessageHandler.spec.ts. Excluded: temp_plan.md (not committed).
Code Review Summary✅ Review Complete - No Issues Found I've thoroughly reviewed all changes in this PR and found no bugs or issues that need to be addressed. The implementation is well-designed with proper error handling, timeout protection, and test coverage. Key improvements implemented:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This pull request refactors the router model fetching mechanism to improve performance and reduce redundant network requests. The changes introduce three phases: stale-while-revalidate caching, selective provider fetching, and request debouncing.
Key changes:
- Adds new
requestRouterModelsAllmessage type to separate full provider fetches from scoped fetches - Implements stale-while-revalidate caching strategy with background refresh for model and endpoint fetches
- Adds request debouncing and coalescing to prevent concurrent duplicate fetches
- Makes
includeCurrentTimeandincludeCurrentCostoptional fields inExtensionState - Fixes local constant definition for
DEFAULT_CHECKPOINT_TIMEOUT_SECONDS
Reviewed Changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 11 comments.
Show a summary per file
| File | Description |
|---|---|
| src/shared/WebviewMessage.ts | Adds new requestRouterModelsAll message type |
| src/shared/ExtensionMessage.ts | Makes includeCurrentTime and includeCurrentCost optional fields |
| src/core/webview/webviewMessageHandler.ts | Implements debouncing, selective provider fetching, and fixes checkpoint timeout handling |
| src/core/webview/tests/webviewMessageHandler.spec.ts | Updates tests to use requestRouterModelsAll |
| src/api/providers/openrouter.ts | Adds debug logging for model fetch counts |
| src/api/providers/fetchers/modelEndpointCache.ts | Implements stale-while-revalidate caching with background refresh and request coalescing |
| src/api/providers/fetchers/modelCache.ts | Implements stale-while-revalidate caching with background refresh and request coalescing |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| key: "roo" as RouterName, | ||
| options: { |
Copilot
AI
Oct 29, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using as any and as RouterName suggests that 'roo' is not properly defined in the RouterName type union. If 'roo' is a valid provider, it should be added to the RouterName type definition rather than using type assertions.
- Remove inline withTimeout helper in favor of AbortSignal.timeout() - Add optional AbortSignal parameter to all provider model fetchers: - openrouter, requesty, glama, unbound, litellm, ollama, lmstudio - deepinfra, io-intelligence, vercel-ai-gateway, huggingface, roo - Standardize timeout handling across modelCache and modelEndpointCache - Add useRouterModelsAll hook for settings UI to fetch all providers - Update Unbound and ApiOptions to use requestRouterModelsAll This ensures consistent cancellation behavior and prepares for better request lifecycle management across the codebase.
- Remove unnecessary String(provider) conversion - Remove verbose console.log statements for cache operations - Remove action-tracking comments that don't add value - Keep only essential error logging for debugging
Code Review SummaryStatus: Changes look promising but a few issues should be addressed before merge. Key findings
Actionable TODOs
|
| } | ||
|
|
||
| // Build full list then filter to active provider | ||
| const allFetches: { key: RouterName; options: GetModelsOptions }[] = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Active-provider scoping gap: when apiConfiguration.apiProvider is 'ollama', 'lmstudio', or 'huggingface', requestRouterModels builds allFetches without those providers and then filters to the active provider. If one of those is active, modelFetchPromises becomes empty and the handler posts an empty routerModels payload, which breaks chat flows for these providers. Consider including the active local provider in allFetches when selected, or triggering their specific handlers (requestOllamaModels/requestLmStudioModels/requestHuggingFaceModels) as a fallback so the UI receives models for the active provider.
| const memoryCache = new NodeCache({ stdTTL: 5 * 60, checkperiod: 5 * 60 }) | ||
|
|
||
| // Coalesce concurrent fetches per provider within this extension host | ||
| const inFlightModelFetches = new Map<RouterName, Promise<ModelRecord>>() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In-flight coalescing key is too coarse. Coalescing by provider only can return incorrect results for providers whose model lists depend on options (baseUrl/apiKey), e.g. 'litellm', 'requesty', 'roo', 'ollama', 'lmstudio', 'deepinfra', 'io-intelligence'. Two concurrent calls with different options will share the same in-flight promise and also write to the same file cache key, causing cross-config mixing. Consider deriving a composite key: provider + normalized baseUrl + an auth/materialized identity hint (e.g., a hash of apiKey presence or token subject), and include this in both the in-flight map key and the file-cache filename.
- Update litellm, lmstudio, modelCache, and vercel-ai-gateway tests - Tests now expect optional AbortSignal parameter (undefined when not provided) - All 52 tests in affected files now passing
Address review feedback: 1. Remove in-flight coalescing logic (out of scope for this PR) - Remove inFlightModelFetches map and related logic from modelCache.ts - Remove inFlightEndpointFetches map and related logic from modelEndpointCache.ts - Remove background refresh on file cache hit - Simplify to: memory cache → file cache → network fetch 2. Fix active-provider scoping gap for local providers - Include ollama/lmstudio/huggingface in allFetches when they are the active provider - Prevents empty routerModels response that breaks chat flows for these providers The PR now focuses solely on its primary goal: scope model fetching to the active provider to reduce unnecessary network requests.
Address review feedback by removing out-of-scope optimizations: 1. Remove in-flight coalescing infrastructure - Delete inFlightModelFetches and inFlightEndpointFetches maps - Eliminate promise sharing across concurrent requests 2. Remove background refresh on file cache hit - Simplify to synchronous flow: memory → file → network - No more fire-and-forget background updates 3. Remove cache performance logging - Delete console.log statements for cache_hit, file_hit, bg_refresh - Clean up debugging artifacts from development 4. Fix active-provider scoping gap - Include ollama/lmstudio/huggingface in requestRouterModels when active - Prevents empty response that breaks chat flows for local providers Result: Simpler, more maintainable code focused on core goal of reducing unnecessary network requests by scoping to active provider.
Refactor to improve separation of concerns: - Create src/services/router-models/index.ts to handle provider model fetching - Extract buildProviderFetchList() function for fetch options construction - Extract fetchRouterModels() function for coordinated model fetching - Move 150+ lines of provider-specific logic out of webviewMessageHandler - Add comprehensive tests in router-models-service.spec.ts (11 test cases) Benefits: - Cleaner webviewMessageHandler with less business logic - Reusable service for router model operations - Better testability with isolated unit tests - Clear separation between UI message handling and data fetching Files changed: - New: src/services/router-models/index.ts - New: src/services/router-models/__tests__/router-models-service.spec.ts - Modified: src/core/webview/webviewMessageHandler.ts (simplified)
|
Superseded by these 2 PRs, they seem tighter in scope and should achieve the primary goal of keeping the frontend state a bit smaller #8916 - Backend Filtering #8917 - Frontend Provider Fetch |
Summary
Reduce unnecessary provider model list fetches by scoping to the active provider during normal chat flows.
What & Why
Problem: Extension was fetching models from all 12 providers on every settings change and chat interaction, causing unnecessary network overhead.
Solution: Scope
requestRouterModelsto only fetch the active provider's models during normal flows, while keeping an explicitrequestRouterModelsAllpath for activation and settings panels.Key Changes
1. Active-Provider Scoping
File: src/core/webview/webviewMessageHandler.ts
2. Simplified Caching
Files:
3. API Updates
requestRouterModelsAllmessage type in src/shared/WebviewMessage.tsincludeCurrentTime/includeCurrentCostoptional in src/shared/ExtensionMessage.ts4. Test Updates
requestRouterModelsAllModified Files
Behavior Changes
requestRouterModelsAllPerformance Impact
Tests & CI
✅ All tests passing
✅ Lint and type checks clean
✅ No breaking changes to existing APIs
Risks & Mitigations
requestRouterModelsAllin settings