feat: rewarm cache from planner cache on schema change #2445

SkArchon · 2026-01-09T11:57:21Z

Summary by CodeRabbit

New Features
- In-memory switchover caching to retain and reuse prepared execution plans across restarts.
Configuration
- New cache-warmup flag to enable/disable in-memory switchover fallback (defaults to true).
Behavioral
- Cache persistence across config-change restarts and automatic cleanup of unused feature-flag caches on startup.
API
- Registration now reports whether the cache warmer is enabled.
Tests
- Expanded coverage for switchover caching, reloads, cleanup, and cross-restart persistence.
Chores
- Updated cache dependency version.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Description

This PR introduces a new mechanism to warm the cache on restarts due to schema changes (or whenever the router execution config changes). Config to enable this

cache_warmup:
  enabled: true
  source:
    in_memory_switchover:
      enabled: true

If in_memory_switchover is not present or false, we default to the current cdn cache warmer (with cosmo), this preserves backwards compatibility.

This is achieved by first adding a new switchoverConfig which can keep references across restarts, we use this to store a reference to the planner cache. When the router restarts due to an execution config change, we then loop through the elements in the ristretto planner cache.

Right now ristretto does not support iterating, so we have opened a PR here dgraph-io/ristretto#475. For this PR (which we need to discuss before merging) we have pointed ristretto to the fork from where the PR was opened.

We create a new provider for the cache warmer, for in_memory_switcher which extracts the queries from the planner cache upon startup. Note that with this method there is no cache warming upon startup, only on subsequent restarts due to execution config changes.

We also support persisting across restarts on hot config reloading. However we ran into a problem of shutdown occuring first before the new graph mux was created (which is the opposite for schema change updates), which means that the current planner cache gets emptied before we can extract the queries from it. As a solution we preemptively extract the queries from the planner cache before its closed whenever a hot config reloading occurs.

This PR also changes how cache warmer is enabled when used with the cosmo CDN, previously it requested the manifest from the CDN if the cache warmer was enabled in the router config.yaml, this meant that ex-cache warmer customers could still access old manifest files fron the cdn (as long as it was present), however now its only enabled if the feature is enabled in cosmo (upon startup we get if cache warmer is enabled from cosmo).

Checklist

I have discussed my proposed changes in an issue and have received approval to proceed.
I have followed the coding standards of the project.
Tests or benchmarks have been added or updated.
Documentation has been updated on https://github.com/wundergraph/cosmo-docs.
I have read the Contributors Guide.

coderabbitai · 2026-01-09T11:57:34Z

Walkthrough

Adds an in-memory switchover cache and wiring to persist per-feature plan data across config-change restarts; extends planner cache interfaces and plan sources; threads SwitchoverConfig and cache-warmer flags through supervisor/startup/graph-server; updates config/schema/proto; and adds unit and integration tests.

Changes

Cohort / File(s)	Summary
Integration tests `router-tests/cache_warmup_test.go`, `router-tests/go.mod`	Add `TestInMemorySwitchoverCaching` integration suite for cache-warmup/switchover flows; update test module deps. (Duplicate test function occurrence present in diff.)
Unit tests `router/core/restart_switchover_config_test.go`	Add comprehensive unit tests for `InMemorySwitchOverCache` / `SwitchoverConfig` (enable/disable, get/set, cleanup, restart conversion, IsEnabled).
Switchover subsystem `router/core/restart_switchover_config.go`	New `SwitchoverConfig` and `InMemorySwitchOverCache` with constructors and methods to manage per-feature plan state, config-driven updates, cleanup, and extraction of operations for restart migration.
Planner & plan source `router/core/operation_planner.go`, `router/core/cache_warmup_plans.go`	Add plan content storage (`planWithMetaData.content`), thread `storeContent` flag through `OperationPlanner` and `preparePlan`; extend `ExecutionPlanCache` with `IterValues`; add `PlanSource` implementing `CacheWarmupSource`.
Graph server integration `router/core/graph_server.go`	Thread `SwitchoverConfig` and `CosmoCacheWarmerEnabled` through mux/handler construction; pass switchover enablement into planner initialization and per-feature muxes.
Router wiring & lifecycle `router/core/router.go`, `router/core/supervisor.go`, `router/core/supervisor_instance.go`	Add `switchoverConfig` field and `WithSwitchoverConfig` option; propagate through supervisor/resources/options; invoke restart-processing and feature cleanup during startup/graph-swap.
Startup CLI `router/cmd/main.go`	Initialize `SwitchoverConfig` in `RouterSupervisorOpts` via `core.NewSwitchoverConfig(...)`.
Config & schema `router/pkg/config/config.go`, `router/pkg/config/config.schema.json`, `router/pkg/config/testdata/*`, `router/pkg/config/router_config.go`	Add `InMemorySwitchoverFallback` bool to cache_warmup config, update schema and testdata, expose flag in `Usage()`.
Proto / registration `proto/wg/cosmo/node/v1/node.proto`, `connect/src/wg/cosmo/node/v1/node_pb.ts`, `controlplane/src/core/bufservices/NodeService.ts`	Add `is_cache_warmer_enabled` / `isCacheWarmerEnabled` boolean to `RegistrationInfo` and populate it from node features in `NodeService`.
Module deps `router/go.mod`, `router-tests/go.mod`	Bump `github.com/dgraph-io/ristretto/v2` (v2.1.0 -> v2.4.0).
Minor / formatting `router/core/cache_warmup_cdn.go`	Import reorder only (no behavioral change).

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

feat: circuit breaker implementation #1929: Modifies graph mux/handler construction and BuildGraphMuxOptions signature—closely related to threading additional runtime options (switchover/cache-warmer) through the graph server.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 22.73% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'feat: rewarm cache from planner cache on schema change' accurately and concisely describes the main feature being added: a mechanism to reuse cached data from the planner when the router restarts due to schema changes.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-01-09T12:01:06Z

Router image scan passed

✅ No security vulnerabilities found in image:

ghcr.io/wundergraph/cosmo/router:sha-f1d790522817170bcb0550d1c5df16cc32691d4c

codecov · 2026-01-09T12:06:26Z

Codecov Report

❌ Patch coverage is 80.25478% with 31 lines in your changes missing coverage. Please review.
✅ Project coverage is 36.06%. Comparing base (c5071f8) to head (fd68357).
⚠️ Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
router/core/graph_server.go	68.88%	13 Missing and 1 partial ⚠️
router/core/restart_switchover_config.go	91.13%	4 Missing and 3 partials ⚠️
router/core/router.go	62.50%	3 Missing ⚠️
router/core/supervisor_instance.go	0.00%	3 Missing ⚠️
router/cmd/main.go	0.00%	2 Missing ⚠️
controlplane/src/core/bufservices/NodeService.ts	0.00%	1 Missing ⚠️
router/core/cache_warmup_plans.go	85.71%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##            main    #2445       +/-   ##
==========================================
+ Coverage   1.49%   36.06%   +34.57%     
==========================================
  Files        292      946      +654     
  Lines      46960   123527    +76567     
  Branches     431     4944     +4513     
==========================================
+ Hits         703    44553    +43850     
- Misses     45974    77409    +31435     
- Partials     283     1565     +1282

Files with missing lines	Coverage Δ
router/core/cache_warmup_cdn.go	`51.42% <ø> (ø)`
router/core/operation_planner.go	`68.00% <100.00%> (ø)`
router/core/router_config.go	`93.71% <100.00%> (ø)`
router/core/supervisor.go	`60.97% <100.00%> (ø)`
router/gen/proto/wg/cosmo/node/v1/node.pb.go	`20.74% <ø> (ø)`
router/pkg/config/config.go	`80.51% <ø> (ø)`
controlplane/src/core/bufservices/NodeService.ts	`20.68% <0.00%> (ø)`
router/core/cache_warmup_plans.go	`85.71% <85.71%> (ø)`
router/cmd/main.go	`0.00% <0.00%> (ø)`
router/core/router.go	`71.06% <62.50%> (ø)`
... and 3 more

... and 641 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

…-existing-caches

coderabbitai

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

router/go.mod (2)

175-199: Fix inconsistent ristretto/v2 fork across multi-module repo

The router module pins a fork (github.com/wundergraph/ristretto/v2 v2.0.0-20260113151003-f7231509ad4d), but router-tests, demo, and graphqlmetrics modules all require the upstream version (v2.1.0). This creates mixed implementations at build time. The replace directive in router/go.mod only affects that module's dependencies, not the workspace. Either apply the same replace directive to other go.mod files that depend on ristretto, or unify the version across all modules.

59-88: Add explanatory comment about the Ristretto fork in go.mod

The fork github.com/wundergraph/ristretto/v2 at commit f7231509ad4d (branch milinda/iterating_elements) adds iterator support for cache elements, which is not available in the upstream v2.0.0 release. Add a comment in the replace directive explaining this purpose. Since no other modules in the repo depend on ristretto v2.1.x features (the require block specifies only v2.0.0), there are no compatibility concerns with using the fork.

🤖 Fix all issues with AI agents

In `@router-tests/cache_warmup_test.go`:
- Around line 1100-1106: The test uses require.EventuallyWithT with an
assert.CollectT callback but calls require.Equal inside the retry closure (in
the block passed to require.EventuallyWithT) which will abort immediately;
replace the failing assertions inside that callback with assert.Equal so the
checks are retried: locate the closure passed to require.EventuallyWithT (where
res = xEnv.MakeGraphQLRequestOK(...)) and change require.Equal(t, "updated",
res.Response.Header.Get("X-Router-Config-Version")) and require.Equal(t, "HIT",
res.Response.Header.Get("x-wg-execution-plan-cache")) to assert.Equal(t,
"updated", ...) and assert.Equal(t, "HIT", ...).

In `@router/core/http_server.go`:
- Line 39: Remove the dead field from the httpServerOptions struct:
`switchoverConfig` is never set when creating `httpServerOptions` and is not
used by `newServer()`, since `SwitchoverConfig` is handled at the `Router` level
and passed into `newGraphServer`; delete the `switchoverConfig
*SwitchoverConfig` field from the `httpServerOptions` definition and update any
references (none expected) to rely on the Router/newGraphServer flow instead.

In `@router/core/restart_switchover_config.go`:
- Around line 24-38: UpdateSwitchoverConfig writes
s.inMemorySwitchOverCache.enabled without synchronization which races with reads
in getPlanCacheForFF, setPlanCacheForFF, and cleanupUnusedFeatureFlags; fix by
acquiring the SwitchoverConfig write lock (e.g., s.mu or the existing cache
lock) around the block that sets s.inMemorySwitchOverCache.enabled and mutates
queriesForFeatureFlag/testValue, and change the reader methods
getPlanCacheForFF, setPlanCacheForFF, and cleanupUnusedFeatureFlags to check
enabled while holding the same lock (or alternatively convert enabled to an
atomic.Bool and use atomic load/store for reads/writes) so readers and writers
are consistent.

In `@router/core/router.go`:
- Around line 1177-1181: Fix the typo in the inline comment near the switchover
config initialization: change "sueprvisor" to "supervisor" in the comment that
precedes the r.switchoverConfig = NewSwitchoverConfig() block so the comment
reads correctly while leaving the surrounding logic (r.switchoverConfig,
NewSwitchoverConfig(), UpdateSwitchoverConfig(&r.Config)) unchanged.

🧹 Nitpick comments (4)

router/pkg/config/config.go (1)
974-986: Consider making CacheWarmupSource.InMemorySwitchover a pointer to preserve “unset vs disabled”
At Line 974-977, InMemorySwitchover is a non-pointer struct with omitempty. Depending on the marshaler, this may still serialize as present (and it definitely can’t represent “not provided” distinctly from “provided but disabled”), which tends to leak into golden testdata and example configs (as seen in config_defaults.json).

If you want cache_warmup.source to be truly optional/empty unless explicitly configured, consider changing it to a pointer like *CacheWarmupInMemorySwitchover.
Proposed change
 type CacheWarmupSource struct {
 	Filesystem         *CacheWarmupFileSystemSource  `yaml:"filesystem,omitempty"`
-	InMemorySwitchover CacheWarmupInMemorySwitchover `yaml:"in_memory_switchover,omitempty"`
+	InMemorySwitchover *CacheWarmupInMemorySwitchover `yaml:"in_memory_switchover,omitempty"`
 }
router/pkg/config/config.schema.json (1)

2387-2423: Schema polish: add default: false + tighten wording for in_memory_switchover.enabled
For the new block at Line 2404-2413:

enabled has no default in the schema (even though the Go config uses envDefault:"false"). Consider adding "default": false for consistency with other booleans.

Minor: description “Is in memory switchover enabled” could be normalized (and maybe mention it reuses prior process’ cached plans, if that’s the intent).

router/core/cache_warmup_plans.go (1)

16-30: Consider preserving additional operation metadata for complete cache warmup.

The current implementation only extracts the Query content from cached plans. For accurate cache warmup, operations may also need variables, operation name, and headers that were part of the original request. If planWithMetaData stores additional context, consider extracting it here.

Also, based on the past discussion about limiting cache warmer operations to ~100, this implementation iterates unboundedly. Consider adding a limit if the plan cache can grow large.

router/core/restart_switchover_config.go (1)

44-49: Consider documenting or removing the testValue field.

The testValue field appears to be for testing purposes but is not documented. If it's only used in tests, consider adding a comment explaining its purpose, or moving test-specific logic elsewhere.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 08bcfb5 and d576e03.

⛔ Files ignored due to path filters (1)

router/go.sum is excluded by !**/*.sum

📒 Files selected for processing (16)

router-tests/cache_warmup_test.go
router/cmd/main.go
router/core/cache_warmup_plans.go
router/core/graph_server.go
router/core/http_server.go
router/core/operation_planner.go
router/core/restart_switchover_config.go
router/core/router.go
router/core/router_config.go
router/core/supervisor.go
router/core/supervisor_instance.go
router/go.mod
router/pkg/config/config.go
router/pkg/config/config.schema.json
router/pkg/config/testdata/config_defaults.json
router/pkg/config/testdata/config_full.json

🧰 Additional context used

🧠 Learnings (10)

📓 Common learnings

Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2172
File: router/core/graph_server.go:0-0
Timestamp: 2025-09-17T20:55:39.456Z
Learning: The Initialize method in router/internal/retrytransport/manager.go has been updated to properly handle feature-flag-only subgraphs by collecting subgraphs from both routerConfig.GetSubgraphs() and routerConfig.FeatureFlagConfigs.ConfigByFeatureFlagName, ensuring all subgraphs receive retry configuration.

📚 Learning: 2025-09-17T20:55:39.456Z

Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2172
File: router/core/graph_server.go:0-0
Timestamp: 2025-09-17T20:55:39.456Z
Learning: The Initialize method in router/internal/retrytransport/manager.go has been updated to properly handle feature-flag-only subgraphs by collecting subgraphs from both routerConfig.GetSubgraphs() and routerConfig.FeatureFlagConfigs.ConfigByFeatureFlagName, ensuring all subgraphs receive retry configuration.

Applied to files:

router/core/supervisor.go
router/cmd/main.go
router/core/router.go
router-tests/cache_warmup_test.go
router/core/restart_switchover_config.go
router/core/supervisor_instance.go
router/core/graph_server.go

📚 Learning: 2026-01-06T12:37:21.521Z

Learnt from: asoorm
Repo: wundergraph/cosmo PR: 2379
File: router/pkg/connectrpc/operation_registry_test.go:381-399
Timestamp: 2026-01-06T12:37:21.521Z
Learning: In Go code (Go 1.25+), prefer using sync.WaitGroup.Go(func()) to run a function in a new goroutine, letting the WaitGroup manage Add/Done automatically. Avoid manual wg.Add(1) followed by go func() { defer wg.Done(); ... }() patterns. Apply this guidance across all Go files in the wundergraph/cosmo repository where concurrency is used.

Applied to files:

router/core/supervisor.go
router/core/http_server.go
router/core/router_config.go
router/core/cache_warmup_plans.go
router/pkg/config/config.go
router/cmd/main.go
router/core/router.go
router-tests/cache_warmup_test.go
router/core/restart_switchover_config.go
router/core/supervisor_instance.go
router/core/operation_planner.go
router/core/graph_server.go

📚 Learning: 2025-08-12T13:50:45.964Z

Learnt from: Noroth
Repo: wundergraph/cosmo PR: 2132
File: router-plugin/plugin.go:139-146
Timestamp: 2025-08-12T13:50:45.964Z
Learning: In the Cosmo router plugin system, the plugin framework creates its own logger independently. When creating a logger in NewRouterPlugin, it's only needed for the gRPC server setup (passed via setup.GrpcServerInitOpts.Logger) and doesn't need to be assigned back to serveConfig.Logger.

Applied to files:

router/cmd/main.go

📚 Learning: 2025-10-01T20:39:16.113Z

Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2252
File: router-tests/telemetry/telemetry_test.go:9684-9693
Timestamp: 2025-10-01T20:39:16.113Z
Learning: Repo preference: In router-tests/telemetry/telemetry_test.go, keep strict > 0 assertions for request.operation.*Time (parsingTime, normalizationTime, validationTime, planningTime) in telemetry-related tests; do not relax to >= 0 unless CI flakiness is observed.

Applied to files:

router-tests/cache_warmup_test.go

📚 Learning: 2025-08-20T22:13:25.222Z

Learnt from: StarpTech
Repo: wundergraph/cosmo PR: 2157
File: router-tests/go.mod:16-16
Timestamp: 2025-08-20T22:13:25.222Z
Learning: github.com/mark3labs/mcp-go v0.38.0 has regressions and should not be used in the wundergraph/cosmo project. v0.36.0 is the stable version that should be used across router-tests and other modules.

Applied to files:

router-tests/cache_warmup_test.go
router/go.mod

📚 Learning: 2025-08-26T17:19:28.178Z

Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2167
File: router/internal/expr/retry_context.go:3-10
Timestamp: 2025-08-26T17:19:28.178Z
Learning: In the Cosmo router codebase, prefer using netErr.Timeout() checks over explicit context.DeadlineExceeded checks for timeout detection in retry logic, as Go's networking stack typically wraps context deadline exceeded errors in proper net.Error implementations.

Applied to files:

router-tests/cache_warmup_test.go

📚 Learning: 2025-08-28T09:18:10.121Z

Learnt from: endigma
Repo: wundergraph/cosmo PR: 2141
File: router-tests/http_subscriptions_test.go:100-108
Timestamp: 2025-08-28T09:18:10.121Z
Learning: In router-tests/http_subscriptions_test.go heartbeat tests, the message ordering should remain strict with data messages followed by heartbeat messages, as the timing is deterministic and known by design in the Cosmo router implementation.

Applied to files:

router-tests/cache_warmup_test.go

📚 Learning: 2025-08-20T10:08:17.857Z

Learnt from: endigma
Repo: wundergraph/cosmo PR: 2155
File: router/core/router.go:1857-1866
Timestamp: 2025-08-20T10:08:17.857Z
Learning: router/pkg/config/config.schema.json forbids null values for traffic_shaping.subgraphs: additionalProperties references $defs.traffic_shaping_subgraph_request_rule with type "object". Therefore, in core.NewSubgraphTransportOptions, dereferencing each subgraph rule pointer is safe under schema-validated configs, and a nil-check is unnecessary.

Applied to files:

router/core/graph_server.go

📚 Learning: 2025-11-12T18:38:46.619Z

Learnt from: StarpTech
Repo: wundergraph/cosmo PR: 2327
File: graphqlmetrics/core/metrics_service.go:435-442
Timestamp: 2025-11-12T18:38:46.619Z
Learning: In graphqlmetrics/core/metrics_service.go, the opGuardCache TTL (30 days) is intentionally shorter than the gql_metrics_operations database retention period (90 days). This design reduces memory usage while relying on database constraints to prevent duplicate inserts after cache expiration.

Applied to files:

router/core/graph_server.go

🧬 Code graph analysis (6)

router/core/supervisor.go (2)

router/core/restart_switchover_config.go (1)

SwitchoverConfig (14-16)

router/core/router.go (1)

Router (84-94)

router/core/http_server.go (1)

router/core/restart_switchover_config.go (1)

SwitchoverConfig (14-16)

router/cmd/main.go (1)

router/core/restart_switchover_config.go (2)

SwitchoverConfig (14-16)

NewSwitchoverConfig (18-22)

router/core/router.go (1)

router/core/restart_switchover_config.go (2)

SwitchoverConfig (14-16)

NewSwitchoverConfig (18-22)

router/core/supervisor_instance.go (3)

router/core/restart_switchover_config.go (1)

SwitchoverConfig (14-16)

router/core/router_config.go (1)

Config (49-144)

router/core/router.go (2)

Option (174-174)

WithSwitchoverConfig (2057-2061)

router/core/graph_server.go (4)

router/core/restart_switchover_config.go (1)

SwitchoverConfig (14-16)

router/core/router.go (1)

Router (84-94)

router/core/operation_planner.go (1)

NewOperationPlanner (50-57)

router/core/cache_warmup_plans.go (1)

NewPlanSource (16-30)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)

GitHub Check: build-router
GitHub Check: integration_test (./telemetry)
GitHub Check: build_push_image (nonroot)
GitHub Check: build_push_image
GitHub Check: integration_test (./events)
GitHub Check: build_test
GitHub Check: image_scan (nonroot)
GitHub Check: image_scan
GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)
GitHub Check: build_test
GitHub Check: Analyze (go)
GitHub Check: Analyze (javascript-typescript)

🔇 Additional comments (36)

router/cmd/main.go (1)

152-172: SwitchoverConfig wiring looks correct; ensure lifecycle matches intended cache persistence scope
Line 152-155 passes a single core.NewSwitchoverConfig() into the supervisor—so caches/state will live for the process lifetime across reloads (likely desired for “reprocess existing plans”). Just ensure this is the intended scope (vs “per reload”) and that the underlying cache is concurrency-safe.

router/pkg/config/testdata/config_defaults.json (1)

311-322: config_defaults.json is a golden test output file, not a configuration subject to schema validation—no changes needed

The concern about schema mismatch is based on a misunderstanding of the file's purpose. config_defaults.json is purely a golden test artifact in TestDefaults() used to verify the JSON serialization of the Config struct; it is never loaded from disk or validated against config.schema.json.

The schema validation in LoadConfig() applies only to YAML input files. When the Config struct is marshaled to JSON for the golden test, Go's default JSON marshaling uses PascalCase field names (Filesystem, InMemorySwitchover) and includes null values where structs have omitempty YAML tags. This serialization format is independent of the schema, which constrains YAML input with snake_case keys. The presence of "Filesystem": null in the output is expected and correct.

router/pkg/config/testdata/config_full.json (1)

660-671: LGTM!

The InMemorySwitchover configuration block is correctly added under CacheWarmup.Source, following the existing pattern with Filesystem. The default Enabled: false is appropriate for this comprehensive config test file.

router/core/supervisor.go (3)

32-36: LGTM!

The SwitchoverConfig field is correctly added to RouterResources, enabling the switchover configuration to be passed through to downstream components like the Router.

38-44: LGTM!

The SwitchoverConfig field in RouterSupervisorOpts provides the input path for the switchover configuration.

52-55: LGTM!

The initialization correctly propagates SwitchoverConfig from options to resources, completing the wiring through the supervisor layer.

router-tests/cache_warmup_test.go (5)

5-6: LGTM!

The new imports are appropriate for the added test functionality: assert for EventuallyWithT callbacks and configpoller for the mock configuration poller.

919-971: LGTM - well-structured test for in-memory switchover caching.

The test correctly validates that:

Initial request results in a cache MISS

After config update (simulating switchover), the same query results in a cache HIT

This confirms the in-memory plan cache is preserved across config reloads.

973-1014: LGTM!

Correctly tests that cache is not preserved when cache warmup is entirely disabled, resulting in MISS on both requests.

1016-1062: LGTM!

Correctly tests that the default cache warmer (with InMemorySwitchover.Enabled = false) does not preserve plans across config reloads.

1064-1108: Both writeTestConfig (defined in router-tests/config_hot_reload_test.go:579) and ConfigPollerMock (defined in router-tests/config_hot_reload_test.go:32) are properly defined in the test package and are accessible to this test file.

router/core/router_config.go (1)

314-321: LGTM - source classification logic is correct and type-safe.

The addition correctly classifies the cache warmup source. The control flow properly prioritizes Filesystem over InMemorySwitchover. Since InMemorySwitchover is a value type (not a pointer), accessing its Enabled field is safe—no nil check is needed.

router/core/cache_warmup_plans.go (1)

32-34: LGTM!

Simple and efficient - queries are extracted once during construction, allowing the source cache to be garbage collected.

router/core/supervisor_instance.go (3)

53-53: LGTM!

Correctly threads SwitchoverConfig through the router option construction.

184-184: LGTM!

Function signature updated to accept switchoverConfig parameter for propagation.

275-275: LGTM!

WithSwitchoverConfig option correctly added to the options list.

router/core/router.go (4)

93-93: LGTM!

Field addition to hold switchover configuration state.

600-600: LGTM!

Correctly passes switchoverConfig to newGraphServer for plan cache management.

608-610: LGTM!

Feature flag cleanup after server swap ensures unused caches are pruned.

2057-2061: LGTM!

Standard option pattern for setting switchover configuration.

router/core/restart_switchover_config.go (2)

71-78: Clarify behavior when FeatureFlagConfigs is nil.

When routerCfg.FeatureFlagConfigs is nil, the method returns early without cleaning up any cached feature flags. This means stale feature flag caches could persist. Is this intentional?

If feature flags can be completely removed from a config, consider clearing all non-base caches when FeatureFlagConfigs is nil.

83-96: LGTM!

The cleanup logic correctly preserves the base configuration (empty string key) while removing caches for feature flags that no longer exist in the router config.

router/core/operation_planner.go (8)

24-24: LGTM!

New content field enables storing the original query for cache warmup scenarios.

32-37: LGTM!

The storeContent flag and operationPlannerOpts struct provide clean control over content storage behavior.

50-56: LGTM!

Constructor correctly accepts and stores the storeContent flag.

59-59: LGTM!

Signature update to accept operationPlannerOpts for controlling plan preparation behavior.

93-95: LGTM!

Content is conditionally stored based on the storeContent option, avoiding unnecessary memory usage when content storage is not needed.

122-122: LGTM!

When skipping cache (for tracing or query plan inclusion), content storage is disabled since these plans are not cached.

150-150: LGTM!

Normal cache path correctly uses the planner's storeContent setting.

44-46: The Iter method is already implemented by ristretto.Cache, which is the only ExecutionPlanCache implementation used in the codebase. It's already being called in router/core/cache_warmup_plans.go:21 without issues. Adding Iter to the interface is not a breaking change.

Likely an incorrect or invalid review comment.

router/core/graph_server.go (6)

108-116: LGTM!

The addition of SwitchoverConfig to BuildGraphMuxOptions follows the existing pattern for optional configuration parameters.

122-123: LGTM!

The signature extension to accept switchoverConfig is correctly propagated through the initialization path.

434-458: LGTM!

The switchoverConfig is correctly threaded through to feature flag mux construction, ensuring consistent behavior across base and feature flag graphs.

526-532: LGTM!

Field alignment changes are cosmetic and improve readability.

1303-1304: No nil pointer dereference risk. SwitchoverConfig and inMemorySwitchOverCache are guaranteed non-nil by initialization.

The opts.SwitchoverConfig parameter passed to newGraphServer is guaranteed to be non-nil with a valid inMemorySwitchOverCache because:

Before newServer() calls newGraphServer() at line 600, the Router's Start() method initializes r.switchoverConfig with a nil check at lines 1177-1179.

NewSwitchoverConfig() always returns a SwitchoverConfig with an initialized inMemorySwitchOverCache field (never nil).

Accessing opts.SwitchoverConfig.inMemorySwitchOverCache.enabled is safe; no defensive nil checks are needed.

Likely an incorrect or invalid review comment.

1358-1362: Thread safety is already properly implemented; remove this concern.

The getPlanCacheForFF and setPlanCacheForFF methods properly use sync.RWMutex (RLock/RUnlock for reads, Lock/Unlock for writes with defers) to synchronize access to the shared cache map. The nil safety concern about opts.SwitchoverConfig is not an issue—line 1304 in the same function already accesses opts.SwitchoverConfig.inMemorySwitchOverCache.enabled without guards, meaning the parameter is guaranteed to be non-nil by design.

Likely an incorrect or invalid review comment.

_{✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.}

router-tests/cache_warmup_test.go

router/core/http_server.go

router/core/restart_switchover_config.go

router/core/router.go

coderabbitai

Actionable comments posted: 2

🤖 Fix all issues with AI agents

In `@router-tests/go.mod`:
- Line 207: The go.mod contains a replace directive pointing to the non-public
fork "github.com/wundergraph/ristretto/v2" which breaks module resolution;
remove that replace and switch to the upstream module
"github.com/dgraph-io/ristretto/v2" (or make the fork public if you must keep
it). Concretely, delete the replace entry that maps
"github.com/wundergraph/ristretto/v2 => github.com/wundergraph/ristretto/v2
v2.0.0-..." and ensure the project requires "github.com/dgraph-io/ristretto/v2"
(update to v2.1.0 if needed), then run "go mod tidy" to refresh the dependency
graph and verify builds/CI pass.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d576e03 and 78fc67d.

⛔ Files ignored due to path filters (1)

router-tests/go.sum is excluded by !**/*.sum

📒 Files selected for processing (5)

router-tests/cache_warmup_test.go
router-tests/go.mod
router/core/restart_switchover_config.go
router/core/restart_switchover_config_test.go
router/core/router.go

🧰 Additional context used

🧠 Learnings (10)

📚 Learning: 2025-08-20T22:13:25.222Z

Learnt from: StarpTech
Repo: wundergraph/cosmo PR: 2157
File: router-tests/go.mod:16-16
Timestamp: 2025-08-20T22:13:25.222Z
Learning: github.com/mark3labs/mcp-go v0.38.0 has regressions and should not be used in the wundergraph/cosmo project. v0.36.0 is the stable version that should be used across router-tests and other modules.

Applied to files:

router-tests/go.mod
router-tests/cache_warmup_test.go

📚 Learning: 2025-09-24T12:54:00.765Z

Learnt from: endigma
Repo: wundergraph/cosmo PR: 2222
File: router-tests/websocket_test.go:2238-2302
Timestamp: 2025-09-24T12:54:00.765Z
Learning: The wundergraph/cosmo project uses Go 1.25 (Go 1.23+ minimum), so fmt.Appendf and other newer Go standard library functions are available and can be used without compatibility concerns.

Applied to files:

router-tests/go.mod

📚 Learning: 2025-09-24T12:54:00.765Z

Learnt from: endigma
Repo: wundergraph/cosmo PR: 2222
File: router-tests/websocket_test.go:2238-2302
Timestamp: 2025-09-24T12:54:00.765Z
Learning: The wundergraph/cosmo project uses Go 1.25 (Go 1.25 minimum), so fmt.Appendf and other newer Go standard library functions are available and can be used without compatibility concerns.

Applied to files:

router-tests/go.mod

📚 Learning: 2025-09-17T20:55:39.456Z

Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2172
File: router/core/graph_server.go:0-0
Timestamp: 2025-09-17T20:55:39.456Z
Learning: The Initialize method in router/internal/retrytransport/manager.go has been updated to properly handle feature-flag-only subgraphs by collecting subgraphs from both routerConfig.GetSubgraphs() and routerConfig.FeatureFlagConfigs.ConfigByFeatureFlagName, ensuring all subgraphs receive retry configuration.

Applied to files:

router/core/restart_switchover_config_test.go
router/core/restart_switchover_config.go
router-tests/cache_warmup_test.go
router/core/router.go

📚 Learning: 2025-10-01T20:39:16.113Z

Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2252
File: router-tests/telemetry/telemetry_test.go:9684-9693
Timestamp: 2025-10-01T20:39:16.113Z
Learning: Repo preference: In router-tests/telemetry/telemetry_test.go, keep strict > 0 assertions for request.operation.*Time (parsingTime, normalizationTime, validationTime, planningTime) in telemetry-related tests; do not relax to >= 0 unless CI flakiness is observed.

Applied to files:

router/core/restart_switchover_config_test.go
router-tests/cache_warmup_test.go

📚 Learning: 2026-01-06T12:37:21.521Z

Learnt from: asoorm
Repo: wundergraph/cosmo PR: 2379
File: router/pkg/connectrpc/operation_registry_test.go:381-399
Timestamp: 2026-01-06T12:37:21.521Z
Learning: In Go code (Go 1.25+), prefer using sync.WaitGroup.Go(func()) to run a function in a new goroutine, letting the WaitGroup manage Add/Done automatically. Avoid manual wg.Add(1) followed by go func() { defer wg.Done(); ... }() patterns. Apply this guidance across all Go files in the wundergraph/cosmo repository where concurrency is used.

Applied to files:

router/core/restart_switchover_config_test.go
router/core/restart_switchover_config.go
router-tests/cache_warmup_test.go
router/core/router.go

📚 Learning: 2025-07-29T08:19:55.720Z

Learnt from: Noroth
Repo: wundergraph/cosmo PR: 2088
File: demo/pkg/subgraphs/projects/src/main_test.go:0-0
Timestamp: 2025-07-29T08:19:55.720Z
Learning: In Go testing, t.Fatal, t.FailNow, t.Skip* and similar methods cannot be called from goroutines other than the main test goroutine, as they will cause a runtime panic. Use channels, t.Error*, or other synchronization mechanisms to communicate errors from goroutines back to the main test thread.

Applied to files:

router-tests/cache_warmup_test.go

📚 Learning: 2025-08-28T09:18:10.121Z

Learnt from: endigma
Repo: wundergraph/cosmo PR: 2141
File: router-tests/http_subscriptions_test.go:100-108
Timestamp: 2025-08-28T09:18:10.121Z
Learning: In router-tests/http_subscriptions_test.go heartbeat tests, the message ordering should remain strict with data messages followed by heartbeat messages, as the timing is deterministic and known by design in the Cosmo router implementation.

Applied to files:

router-tests/cache_warmup_test.go

📚 Learning: 2025-08-26T17:19:28.178Z

Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2167
File: router/internal/expr/retry_context.go:3-10
Timestamp: 2025-08-26T17:19:28.178Z
Learning: In the Cosmo router codebase, prefer using netErr.Timeout() checks over explicit context.DeadlineExceeded checks for timeout detection in retry logic, as Go's networking stack typically wraps context deadline exceeded errors in proper net.Error implementations.

Applied to files:

router-tests/cache_warmup_test.go

📚 Learning: 2025-09-29T10:28:07.122Z

Learnt from: dkorittki
Repo: wundergraph/cosmo PR: 2192
File: router/pkg/config/config.go:1028-1029
Timestamp: 2025-09-29T10:28:07.122Z
Learning: The deprecation strategy for IntrospectionEnabled field in router/pkg/config/config.go is to first mark it as deprecated, then migrate all call sites to use IntrospectionConfig.Enabled, and finally remove the deprecated field. The envDefault tag should be kept during the deprecation period for backward compatibility.

Applied to files:

router/core/router.go

🧬 Code graph analysis (3)

router/core/restart_switchover_config_test.go (2)

router/core/restart_switchover_config.go (2)

InMemorySwitchOverCache (32-36)

NewSwitchoverConfig (18-22)

router/pkg/config/config.go (2)

CacheWarmupConfiguration (989-995)

CacheWarmupInMemorySwitchover (983-985)

router-tests/cache_warmup_test.go (5)

router-tests/testenv/testenv.go (4)

Run (105-122)

Config (292-349)

RouterConfig (246-249)

Environment (1735-1771)

router/pkg/config/config.go (5)

Config (1037-1112)

CacheWarmupConfiguration (989-995)

CacheWarmupSource (974-977)

CacheWarmupInMemorySwitchover (983-985)

ExecutionConfig (865-869)

router/core/router.go (3)

Option (174-174)

WithCacheWarmupConfig (2114-2118)

ExecutionConfig (159-163)

router/core/cache_warmup.go (1)

CacheWarmupSource (26-28)

router/pkg/controlplane/configpoller/config_poller.go (1)

ConfigPoller (20-29)

router/core/router.go (1)

router/core/restart_switchover_config.go (2)

SwitchoverConfig (14-16)

NewSwitchoverConfig (18-22)

🔇 Additional comments (9)

router/core/restart_switchover_config_test.go (1)

1-337: Test coverage looks comprehensive.

The test file provides thorough coverage of the SwitchoverConfig and InMemorySwitchOverCache functionality, including:

Enable/disable state transitions

Edge cases (nil config, already enabled/disabled)

Feature flag cache operations (get/set)

Cleanup of unused feature flags

Delegation from SwitchoverConfig to InMemorySwitchOverCache

router/core/restart_switchover_config.go (1)

11-22: LGTM for structure and initialization.

The type definitions and constructor are well-structured. The planCache type alias improves readability, and NewSwitchoverConfig correctly initializes with a disabled cache state.

router/core/router.go (4)

93-93: LGTM for field addition.

The switchoverConfig field is properly added to the Router struct to maintain state across config changes.

599-611: Clean integration of switchover config in server creation.

The switchover config is correctly passed to newGraphServer and CleanupFeatureFlags is appropriately called after the server swap to prune stale feature flag caches.

1177-1181: LGTM for initialization logic.

The lazy initialization of switchoverConfig for test scenarios is appropriate, and the comment explaining this is helpful (and the typo from the previous review has been fixed).

2057-2061: LGTM for new option function.

The WithSwitchoverConfig option follows the established pattern for router configuration options.

router-tests/cache_warmup_test.go (3)

1100-1106: Correct usage of assert inside EventuallyWithT callback.

Using assert.Equal instead of require.Equal inside the EventuallyWithT callback is correct—this allows the retry mechanism to work properly by collecting failures rather than immediately aborting.

1064-1108: No action needed. The writeTestConfig helper function is defined in router-tests/config_hot_reload_test.go at line 579 and is accessible from the cache_warmup_test.go file since both are in the same package. The function is properly defined and available for use.

919-971: Code is correct as-is—no undefined types.

ConfigPollerMock and writeTestConfig are defined in config_hot_reload_test.go in the same package (integration), making them accessible throughout cache_warmup_test.go. The test implementation is valid.

Likely an incorrect or invalid review comment.

_{✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.}

router-tests/go.mod

router/core/restart_switchover_config.go

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

router/pkg/config/config.schema.json (1)
2387-2422: Allow empty cache_warmup.source or update the description.

The description says an empty source falls back to CDN, but the oneOf currently rejects {}. This makes configs with source: {} invalid despite the documented behavior.
🔧 Suggested schema tweak
         "oneOf": [
           {
             "required": ["filesystem"]
           },
+          {
+            "required": ["in_memory_switchover"]
+          },
+          {
+            "maxProperties": 0
+          }
-          {
-            "required": ["in_memory_switchover"]
-          }
         ]

🤖 Fix all issues with AI agents

In `@router/core/restart_switchover_config.go`:
- Around line 113-126: In InMemorySwitchOverCache.processOnConfigChangeRestart,
avoid the pre-lock enabled check (data race) and the unchecked assertion
data.(planCache) which panics on repeated calls: acquire c.mu.Lock() first,
check c.enabled while holding the lock, then iterate c.queriesForFeatureFlag and
use a type switch (or type check) to handle both planCache and
[]*nodev1.Operation values—call convertToNodeOperation only when the value is
planCache, leave already-converted []*nodev1.Operation as-is (or copy them) and
assign into the new switchoverMap; this prevents panics on repeated invocations
and removes the race on enabled.

In `@router/core/router_config.go`:
- Around line 315-320: The code currently sets usage["cache_warmup_source"] to
"cdn" when InMemorySwitchoverFallback is false even if an explicit in-memory
source block exists; update the branching in the cache warmup usage logic to
inspect the configured source block directly (e.g., check
c.cacheWarmup.Source.InMemorySwitchover != nil or the appropriate
Source.InMemorySwitchover field) before falling back to CDN, ensuring you detect
and set "in_memory_switchover" when the in-memory source is present, still
preferring "filesystem" when c.cacheWarmup.Source.Filesystem != nil and using
InMemorySwitchoverFallback only as the fallback indicator.

♻️ Duplicate comments (3)

router/core/restart_switchover_config.go (3)
70-73: Data race on enabled flag — check should be moved inside the lock.

The enabled check at line 71 occurs before acquiring the read lock at line 75. Since enabled is written under the lock in updateStateFromConfig, this creates a TOCTOU race if the config is updated concurrently.
Proposed fix
 func (c *InMemorySwitchOverCache) getPlanCacheForFF(featureFlagKey string) []*nodev1.Operation {
-	if !c.enabled {
-		return nil
-	}
-
 	c.mu.RLock()
 	defer c.mu.RUnlock()
+	if !c.enabled {
+		return nil
+	}
 
 	switch cache := c.queriesForFeatureFlag[featureFlagKey].(type) {
93-111: Same data race pattern in setPlanCacheForFF and deletePlanCacheForFF.

Both methods check enabled before acquiring the lock. Apply the same fix as for getPlanCacheForFF.
Proposed fix
 func (c *InMemorySwitchOverCache) setPlanCacheForFF(featureFlagKey string, cache planCache) {
-	if !c.enabled || cache == nil {
+	if cache == nil {
 		return
 	}
 
 	c.mu.Lock()
 	defer c.mu.Unlock()
+	if !c.enabled {
+		return
+	}
 	c.queriesForFeatureFlag[featureFlagKey] = cache
 }
 
 func (c *InMemorySwitchOverCache) deletePlanCacheForFF(featureFlagKey string) {
-	if !c.enabled {
-		return
-	}
-
 	c.mu.Lock()
 	defer c.mu.Unlock()
+	if !c.enabled {
+		return
+	}
 	delete(c.queriesForFeatureFlag, featureFlagKey)
 }
128-154: Same data race pattern in cleanupUnusedFeatureFlags.

The enabled check at line 129 occurs before acquiring the lock at line 137.
Proposed fix
 func (c *InMemorySwitchOverCache) cleanupUnusedFeatureFlags(routerCfg *nodev1.RouterConfig) {
-	if !c.enabled {
-		return
-	}
-
 	if routerCfg.FeatureFlagConfigs == nil {
 		return
 	}
 
 	c.mu.Lock()
 	defer c.mu.Unlock()
+	if !c.enabled {
+		return
+	}
 
 	ffNames := make(map[string]struct{})

🧹 Nitpick comments (3)

router-tests/cache_warmup_test.go (1)
956-958: Avoid indefinite hangs waiting on ConfigPollerMock.ready.

Blocking on <-pm.ready without a timeout can hang CI if the poller never signals. A small helper with a timeout keeps tests fail-fast and debuggable.
✅ Safer wait helper
 func TestInMemorySwitchoverCaching(t *testing.T) {
 	t.Parallel()
+
+	waitForPollerReady := func(t *testing.T, ch <-chan struct{}) {
+		t.Helper()
+		select {
+		case <-ch:
+		case <-time.After(2 * time.Second):
+			t.Fatal("config poller did not become ready in time")
+		}
+	}
 
 	t.Run("Verify the plan is cached on config restart when in memory switchover is enabled", func(t *testing.T) {
 		t.Parallel()
@@
-			<-pm.ready
+			waitForPollerReady(t, pm.ready)
@@
-			<-pm.ready
+			waitForPollerReady(t, pm.ready)
@@
-			<-pm.ready
+			waitForPollerReady(t, pm.ready)
Also applies to: 1001-1003, 1045-1047
router/core/restart_switchover_config.go (1)
156-165: Consider nil-check for safety.

convertToNodeOperation doesn't check if data is nil before calling IterValues. While callers should ensure non-nil values, a defensive nil-check would prevent panics from unexpected states.
Suggested improvement
 func convertToNodeOperation(data planCache) []*nodev1.Operation {
+	if data == nil {
+		return make([]*nodev1.Operation, 0)
+	}
 	items := make([]*nodev1.Operation, 0)
 	data.IterValues(func(v *planWithMetaData) (stop bool) {
router/core/graph_server.go (1)
1310-1311: Potential unsynchronized read of enabled field.

This directly accesses opts.SwitchoverConfig.inMemorySwitchOverCache.enabled without holding the lock. While this likely occurs during initialization when no concurrent updates are expected, it's inconsistent with the mutex-protected access pattern used elsewhere.

Consider exposing a synchronized getter method on SwitchoverConfig or InMemorySwitchOverCache to ensure thread-safe access.
Suggested approach

Add a synchronized getter to InMemorySwitchOverCache:
func (c *InMemorySwitchOverCache) isEnabled() bool {
    c.mu.RLock()
    defer c.mu.RUnlock()
    return c.enabled
}
Then update the call:
-operationPlanner := NewOperationPlanner(executor, gm.planCache, opts.SwitchoverConfig.inMemorySwitchOverCache.enabled)
+operationPlanner := NewOperationPlanner(executor, gm.planCache, opts.SwitchoverConfig.inMemorySwitchOverCache.isEnabled())

router/core/restart_switchover_config.go

router/core/router_config.go

…ement-cache-warmer-from-existing-caches

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@router/core/restart_switchover_config.go`:
- Around line 54-72: The write to InMemorySwitchOverCache.enabled is racy;
change enabled to an atomic flag (e.g., sync/atomic or atomic.Bool) and replace
all reads/writes to use atomic Load/Store semantics: update the assignment in
updateStateFromConfig to use c.enabled.Store(newVal) and update the reader
methods getPlanCacheForFF, setPlanCacheForFF, processOnConfigChangeRestart, and
cleanupUnusedFeatureFlags to use c.enabled.Load() instead of direct field access
(or alternatively move both the write and any pre-check reads inside the
existing mu lock); this ensures no TOCTOU race when checking or updating
enabled.

♻️ Duplicate comments (1)

router/core/restart_switchover_config.go (1)
74-95: Data race: enabled read before lock acquisition.

The check at line 75 reads c.enabled before the RLock is acquired at line 79. If updateStateFromConfig runs concurrently and modifies enabled, this method may see an inconsistent state. The same pattern exists in setPlanCacheForFF (line 98), processOnConfigChangeRestart (line 108), and cleanupUnusedFeatureFlags (line 125).
Proposed fix: move enabled check inside the lock
 func (c *InMemorySwitchOverCache) getPlanCacheForFF(featureFlagKey string) []*nodev1.Operation {
-	if !c.enabled {
-		return nil
-	}
-
 	c.mu.RLock()
 	defer c.mu.RUnlock()
+	if !c.enabled {
+		return nil
+	}
 
 	switch cache := c.queriesForFeatureFlag[featureFlagKey].(type) {
Apply the same pattern to setPlanCacheForFF, processOnConfigChangeRestart, and cleanupUnusedFeatureFlags.

🧹 Nitpick comments (1)

router/core/restart_switchover_config_test.go (1)

373-457: Consider adding a concurrency test.

The tests cover functional behavior well, but given the data race concerns in the implementation, consider adding a test that exercises concurrent access to validate thread safety once the locking issues are addressed.

Example concurrency test

func TestInMemorySwitchOverCache_ConcurrentAccess(t *testing.T) {
    t.Parallel()
    cache := &InMemorySwitchOverCache{
        enabled:               true,
        queriesForFeatureFlag: make(map[string]any),
        logger:                zap.NewNop(),
    }

    var wg sync.WaitGroup
    
    // Concurrent reads
    for i := 0; i < 10; i++ {
        wg.Go(func() {
            _ = cache.getPlanCacheForFF("test-ff")
        })
    }
    
    // Concurrent writes
    for i := 0; i < 10; i++ {
        wg.Go(func() {
            cache.updateStateFromConfig(&Config{
                cacheWarmup: &config.CacheWarmupConfiguration{
                    Enabled:                    true,
                    InMemorySwitchoverFallback: true,
                },
            }, false)
        })
    }
    
    wg.Wait()
}

router/core/restart_switchover_config.go

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

router/core/graph_server.go (1)

295-461: Propagate CosmoCacheWarmerEnabled to feature-flag muxes.

Line 456 builds feature-flag muxes without setting CosmoCacheWarmerEnabled, so the warmup switch at Line 1367 can incorrectly choose in-memory fallback even when CDN warmup is enabled. This changes behavior for feature-flag caches compared to the base graph.

✅ Suggested fix

-	multiGraphHandler, err := s.buildMultiGraphHandler(ctx, gm.mux, featureFlagConfigMap, switchoverConfig)
+	multiGraphHandler, err := s.buildMultiGraphHandler(ctx, gm.mux, featureFlagConfigMap, switchoverConfig, cosmoCacheWarmerEnabled)

-func (s *graphServer) buildMultiGraphHandler(
+func (s *graphServer) buildMultiGraphHandler(
 	ctx context.Context,
 	baseMux *chi.Mux,
 	featureFlagConfigs map[string]*nodev1.FeatureFlagRouterExecutionConfig,
-	switchoverConfig *SwitchoverConfig,
+	switchoverConfig *SwitchoverConfig,
+	cosmoCacheWarmerEnabled bool,
 ) (http.HandlerFunc, error) {

 		gm, err := s.buildGraphMux(ctx, BuildGraphMuxOptions{
 			FeatureFlagName:     featureFlagName,
 			RouterConfigVersion: executionConfig.GetVersion(),
 			EngineConfig:        executionConfig.GetEngineConfig(),
 			ConfigSubgraphs:     executionConfig.Subgraphs,
 			SwitchoverConfig:    switchoverConfig,
+			CosmoCacheWarmerEnabled: cosmoCacheWarmerEnabled,
 		})

♻️ Duplicate comments (1)

router/core/restart_switchover_config.go (1)
110-124: Preserve already-converted entries on repeated restart processing.

Line 119 only copies entries that are still planCache. If processOnConfigChangeRestart is called again after a prior conversion, existing []*nodev1.Operation entries are silently dropped, resulting in an empty warmup set.
✅ Suggested fix
-	for k, v := range c.queriesForFeatureFlag {
-		if cache, ok := v.(planCache); ok {
-			switchoverMap[k] = convertToNodeOperation(cache)
-		}
-	}
+	for k, v := range c.queriesForFeatureFlag {
+		switch cache := v.(type) {
+		case planCache:
+			switchoverMap[k] = convertToNodeOperation(cache)
+		case []*nodev1.Operation:
+			// Already converted
+			switchoverMap[k] = cache
+		default:
+			c.logger.Error("unexpected type")
+		}
+	}

coderabbitai

Actionable comments posted: 2

🤖 Fix all issues with AI agents

In `@router/core/graph_server.go`:
- Around line 1313-1314: The check for switchover enabled reads shared state
without synchronization causing a data race; add a helper method on
InMemorySwitchOverCache (e.g., isEnabled()) that takes an RLock, nil-guards
queriesForFeatureFlag and returns whether it is non-nil, then replace the direct
access in graph_server.go (the line setting storeQueryContent using
opts.SwitchoverConfig.inMemorySwitchOverCache) with a call to that helper (also
nil-guard SwitchoverConfig itself before calling), so updateStateFromConfig’s
mutations and this read are synchronized via the cache’s mutex.

In `@router/core/restart_switchover_config.go`:
- Around line 148-154: convertToNodeOperation may miss buffered writes because
IterValues is non-atomic; call the cache's Wait() to flush pending asynchronous
sets before iterating. In convertToNodeOperation (which receives data planCache)
invoke the appropriate Wait() method on the planCache (or its underlying
Ristretto cache field) immediately before data.IterValues(...) so pending writes
are flushed and iteration sees a consistent snapshot. Ensure you use the same
Wait() approach as the tests (or implement an explicit snapshot method) to avoid
missing entries.

♻️ Duplicate comments (1)

router/core/restart_switchover_config.go (1)
115-121: Preserve already-converted switchover entries on repeated restarts.

On a second config-change restart, queriesForFeatureFlag will already contain []*nodev1.Operation. The current loop only keeps planCache values, silently dropping converted entries and losing warmup candidates. This defeats the in-memory warmup on successive restarts.
🔧 Proposed fix (keep already-converted entries)
for k, v := range c.queriesForFeatureFlag {
-	if cache, ok := v.(planCache); ok {
-		switchoverMap[k] = convertToNodeOperation(cache)
-	}
+	switch cache := v.(type) {
+	case planCache:
+		switchoverMap[k] = convertToNodeOperation(cache)
+	case []*nodev1.Operation:
+		switchoverMap[k] = cache
+	default:
+		c.logger.Error("unexpected type")
+	}
}

router/core/graph_server.go

router/core/restart_switchover_config.go

router-tests/cache_warmup_test.go

router/go.mod

router/core/cache_warmup_plans.go

router/core/graph_server.go

router/core/restart_switchover_config.go

router/core/graph_server.go

router/core/restart_switchover_config.go

router/pkg/config/config.go

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@router/core/restart_switchover_config.go`:
- Around line 113-128: The extractQueriesAndOverridePlanCache method currently
only preserves entries of type planCache and drops already-converted operation
slices; update the loop in
InMemorySwitchOverCache.extractQueriesAndOverridePlanCache to handle both types
by checking the value's type (e.g., planCache and []*nodev1.Operation or []
*nodev1.Operation) and copying existing operation slices into switchoverMap
unchanged while converting planCache entries via convertToNodeOperation(cache);
ensure you preserve other non-matching entries as-is or ignore explicitly
depending on intended semantics so repeated calls do not lose
previously-converted data.

♻️ Duplicate comments (1)

router/core/restart_switchover_config.go (1)

18-24: Add documentation for exported types and functions.

The exported SwitchoverConfig type has a brief comment, but the exported functions (NewSwitchoverConfig, UpdateSwitchoverConfig, CleanupFeatureFlags, ProcessOnConfigChangeRouterInstanceRestart) lack godoc comments describing their purpose and usage.

🧹 Nitpick comments (2)

router/core/supervisor.go (1)
160-163: Consider adding a nil check for defensive coding.

If switchoverConfig is ever nil (e.g., in tests or alternative router factory implementations), this will panic. While the current wiring always provides a SwitchoverConfig, a nil guard would make this more robust.
🛡️ Suggested defensive check
 if !shutdown {
+	if rs.router.switchoverConfig != nil {
 		rs.router.switchoverConfig.ProcessOnConfigChangeRouterInstanceRestart()
+	}
 }
router/core/restart_switchover_config_test.go (1)
340-413: Add test case for calling extractQueriesAndOverridePlanCache twice.

The current tests only verify the first conversion. A test that calls the method twice would verify that already-converted []*nodev1.Operation entries are preserved (which currently they're not—see related issue in production code).
🧪 Suggested additional test case
t.Run("preserves already-converted operation slices on second call", func(t *testing.T) {
	t.Parallel()
	mockCache, err := ristretto.NewCache(&ristretto.Config[uint64, *planWithMetaData]{
		MaxCost:            10000,
		NumCounters:        10000000,
		IgnoreInternalCost: true,
		BufferItems:        64,
	})
	require.NoError(t, err)

	query1 := "query { test1 }"
	mockCache.Set(1, &planWithMetaData{content: query1}, 1)
	mockCache.Wait()

	// Pre-existing converted operations
	existingOps := []*nodev1.Operation{
		{Request: &nodev1.OperationRequest{Query: "query { existing }"}},
	}

	cache := &InMemorySwitchOverCache{
		queriesForFeatureFlag: make(map[string]any),
	}
	cache.queriesForFeatureFlag["ff1"] = mockCache
	cache.queriesForFeatureFlag["ff2"] = existingOps

	cache.extractQueriesAndOverridePlanCache()

	// Both should be preserved
	require.Len(t, cache.queriesForFeatureFlag, 2)
	require.IsType(t, []*nodev1.Operation{}, cache.queriesForFeatureFlag["ff1"])
	require.IsType(t, []*nodev1.Operation{}, cache.queriesForFeatureFlag["ff2"])

	ff2Ops := cache.queriesForFeatureFlag["ff2"].([]*nodev1.Operation)
	require.Equal(t, "query { existing }", ff2Ops[0].Request.Query)
})

router/core/restart_switchover_config.go

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

router/core/graph_server.go (1)
1373-1394: Potential nil pointer dereference on SwitchoverConfig in in-memory switchover fallback case.

Lines 1386-1387 access opts.SwitchoverConfig.inMemorySwitchOverCache without verifying that opts.SwitchoverConfig is non-nil. If the configuration enables InMemorySwitchoverFallback but SwitchoverConfig was not initialized (e.g., due to a configuration mismatch), this will cause a panic.
🐛 Proposed fix: Add nil check to the switch case condition
-	case s.cacheWarmup.InMemorySwitchoverFallback && (s.selfRegister == nil || !opts.CosmoCacheWarmerEnabled):
+	case s.cacheWarmup.InMemorySwitchoverFallback && opts.SwitchoverConfig != nil && (s.selfRegister == nil || !opts.CosmoCacheWarmerEnabled):

🤖 Fix all issues with AI agents

In `@router/core/graph_server.go`:
- Around line 1322-1323: The call to
opts.SwitchoverConfig.inMemorySwitchOverCache.IsEnabled() can panic if
SwitchoverConfig is nil; change the construction of operationPlanner
(NewOperationPlanner(executor, gm.planCache, ...)) to compute the boolean safely
by first checking if opts.SwitchoverConfig != nil and only then calling
inMemorySwitchOverCache.IsEnabled(), otherwise default to false, and pass that
computed value into NewOperationPlanner to avoid a nil pointer dereference.

router/core/graph_server.go

…-existing-caches

router/core/graph_server.go

router/core/restart_switchover_config.go

feat: cache plans

ef7e645

github-actions bot added the router label Jan 9, 2026

fix: cleanup

419cabf

SkArchon added 8 commits January 13, 2026 16:27

fix: updates

e65cdf2

fix: updates

2f89643

fix: updates

cf6e9df

fix: updates

0d2e2db

fix: updates

52aa474

fix: updates

8cd21fe

fix: config updates

822a4a4

fix: config reloading

3159713

SkArchon marked this pull request as ready for review January 14, 2026 16:15

SkArchon requested review from Noroth, StarpTech, devsergiy, endigma and jensneuse as code owners January 14, 2026 16:15

Merge branch 'main' into milinda/eng-8701-implement-cache-warmer-from…

d576e03

…-existing-caches

SkArchon marked this pull request as draft January 14, 2026 16:15

fix: update router-tests

2bdf426

coderabbitai bot reviewed Jan 14, 2026

View reviewed changes

router-tests/cache_warmup_test.go Show resolved Hide resolved

router/core/http_server.go Outdated Show resolved Hide resolved

router/core/restart_switchover_config.go Outdated Show resolved Hide resolved

router/core/router.go Outdated Show resolved Hide resolved

SkArchon added 2 commits January 14, 2026 22:09

fix: cleanup

0e92ca1

fix: use assert

78fc67d

SkArchon marked this pull request as ready for review January 14, 2026 16:49

SkArchon marked this pull request as draft January 14, 2026 16:49

coderabbitai bot reviewed Jan 14, 2026

View reviewed changes

router-tests/go.mod Outdated Show resolved Hide resolved

router/core/restart_switchover_config.go Outdated Show resolved Hide resolved

fix: review comments

f4850f6

coderabbitai bot reviewed Jan 19, 2026

View reviewed changes

router/core/restart_switchover_config.go Outdated Show resolved Hide resolved

router/core/router_config.go Show resolved Hide resolved

SkArchon added 4 commits January 19, 2026 21:05

fix: review comments

98608c7

fix: updates

a7adf3d

Merge remote-tracking branch 'origin/main' into milinda/eng-8701-impl…

05026d3

…ement-cache-warmer-from-existing-caches

fix: go mod

6d23b7e

coderabbitai bot reviewed Jan 20, 2026

View reviewed changes

router/core/restart_switchover_config.go Show resolved Hide resolved

SkArchon added 3 commits January 20, 2026 20:55

fix: update enabled

ceabb75

fix: update enabled

ae920ef

fix: review comments

3cfca20

coderabbitai bot reviewed Jan 20, 2026

View reviewed changes

SkArchon added 2 commits January 20, 2026 21:18

fix: review comments

d4ad399

fix: feature flags didnt pass in cosmo flag status

5d79a9e

coderabbitai bot reviewed Jan 20, 2026

View reviewed changes

router/core/graph_server.go Outdated Show resolved Hide resolved

router/core/restart_switchover_config.go Show resolved Hide resolved

SkArchon added 2 commits January 20, 2026 21:31

fix: nil pointer

7f099f5

fix: review comments

f382e46

SkArchon mentioned this pull request Jan 21, 2026

feat: cache warmer in memory switchover wundergraph/cosmo-docs#237

Open

fix: use the new ristretto version

9a34515

Noroth requested changes Jan 22, 2026

View reviewed changes

SkArchon added 2 commits January 22, 2026 17:26

fix: formatting

42c8c81

fix: review comments

ef33a6c

coderabbitai bot reviewed Jan 22, 2026

View reviewed changes

router/core/restart_switchover_config.go Show resolved Hide resolved

fix: review comments

e0734f5

coderabbitai bot reviewed Jan 22, 2026

View reviewed changes

router/core/graph_server.go Show resolved Hide resolved

SkArchon and others added 3 commits January 22, 2026 19:13

fix: review comments

11158f6

fix: tests

54f24c1

Merge branch 'main' into milinda/eng-8701-implement-cache-warmer-from…

40548ad

…-existing-caches

Noroth reviewed Jan 22, 2026

View reviewed changes

router/core/graph_server.go Outdated Show resolved Hide resolved

Noroth reviewed Jan 22, 2026

View reviewed changes

router/core/restart_switchover_config.go Outdated Show resolved Hide resolved

fix: review comments

fd68357

Noroth approved these changes Jan 23, 2026

View reviewed changes

feat: rewarm cache from planner cache on schema change #2445

Are you sure you want to change the base?

feat: rewarm cache from planner cache on schema change #2445

Uh oh!

Conversation

SkArchon commented Jan 9, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Description

Checklist

Uh oh!

coderabbitai bot commented Jan 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Uh oh!

github-actions bot commented Jan 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Router image scan passed

Uh oh!

codecov bot commented Jan 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

SkArchon commented Jan 9, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 9, 2026 •

edited

Loading

github-actions bot commented Jan 9, 2026 •

edited

Loading

codecov bot commented Jan 9, 2026 •

edited

Loading