Skip to content

Conversation

@SkArchon
Copy link
Contributor

@SkArchon SkArchon commented Jan 9, 2026

Summary by CodeRabbit

  • New Features

    • In-memory switchover caching to retain and reuse prepared execution plans across restarts.
  • Configuration

    • New cache-warmup flag to enable/disable in-memory switchover fallback (defaults to true).
  • Behavioral

    • Cache persistence across config-change restarts and automatic cleanup of unused feature-flag caches on startup.
  • API

    • Registration now reports whether the cache warmer is enabled.
  • Tests

    • Expanded coverage for switchover caching, reloads, cleanup, and cross-restart persistence.
  • Chores

    • Updated cache dependency version.

✏️ Tip: You can customize this high-level summary in your review settings.

Description

This PR introduces a new mechanism to warm the cache on restarts due to schema changes (or whenever the router execution config changes). Config to enable this

cache_warmup:
  enabled: true
  source:
    in_memory_switchover:
      enabled: true

If in_memory_switchover is not present or false, we default to the current cdn cache warmer (with cosmo), this preserves backwards compatibility.

This is achieved by first adding a new switchoverConfig which can keep references across restarts, we use this to store a reference to the planner cache. When the router restarts due to an execution config change, we then loop through the elements in the ristretto planner cache.

Right now ristretto does not support iterating, so we have opened a PR here dgraph-io/ristretto#475. For this PR (which we need to discuss before merging) we have pointed ristretto to the fork from where the PR was opened.

We create a new provider for the cache warmer, for in_memory_switcher which extracts the queries from the planner cache upon startup. Note that with this method there is no cache warming upon startup, only on subsequent restarts due to execution config changes.

We also support persisting across restarts on hot config reloading. However we ran into a problem of shutdown occuring first before the new graph mux was created (which is the opposite for schema change updates), which means that the current planner cache gets emptied before we can extract the queries from it. As a solution we preemptively extract the queries from the planner cache before its closed whenever a hot config reloading occurs.

This PR also changes how cache warmer is enabled when used with the cosmo CDN, previously it requested the manifest from the CDN if the cache warmer was enabled in the router config.yaml, this meant that ex-cache warmer customers could still access old manifest files fron the cdn (as long as it was present), however now its only enabled if the feature is enabled in cosmo (upon startup we get if cache warmer is enabled from cosmo).

Checklist

  • I have discussed my proposed changes in an issue and have received approval to proceed.
  • I have followed the coding standards of the project.
  • Tests or benchmarks have been added or updated.
  • Documentation has been updated on https://github.com/wundergraph/cosmo-docs.
  • I have read the Contributors Guide.

@github-actions github-actions bot added the router label Jan 9, 2026
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 9, 2026

Walkthrough

Adds an in-memory switchover cache and wiring to persist per-feature plan data across config-change restarts; extends planner cache interfaces and plan sources; threads SwitchoverConfig and cache-warmer flags through supervisor/startup/graph-server; updates config/schema/proto; and adds unit and integration tests.

Changes

Cohort / File(s) Summary
Integration tests
router-tests/cache_warmup_test.go, router-tests/go.mod
Add TestInMemorySwitchoverCaching integration suite for cache-warmup/switchover flows; update test module deps. (Duplicate test function occurrence present in diff.)
Unit tests
router/core/restart_switchover_config_test.go
Add comprehensive unit tests for InMemorySwitchOverCache / SwitchoverConfig (enable/disable, get/set, cleanup, restart conversion, IsEnabled).
Switchover subsystem
router/core/restart_switchover_config.go
New SwitchoverConfig and InMemorySwitchOverCache with constructors and methods to manage per-feature plan state, config-driven updates, cleanup, and extraction of operations for restart migration.
Planner & plan source
router/core/operation_planner.go, router/core/cache_warmup_plans.go
Add plan content storage (planWithMetaData.content), thread storeContent flag through OperationPlanner and preparePlan; extend ExecutionPlanCache with IterValues; add PlanSource implementing CacheWarmupSource.
Graph server integration
router/core/graph_server.go
Thread SwitchoverConfig and CosmoCacheWarmerEnabled through mux/handler construction; pass switchover enablement into planner initialization and per-feature muxes.
Router wiring & lifecycle
router/core/router.go, router/core/supervisor.go, router/core/supervisor_instance.go
Add switchoverConfig field and WithSwitchoverConfig option; propagate through supervisor/resources/options; invoke restart-processing and feature cleanup during startup/graph-swap.
Startup CLI
router/cmd/main.go
Initialize SwitchoverConfig in RouterSupervisorOpts via core.NewSwitchoverConfig(...).
Config & schema
router/pkg/config/config.go, router/pkg/config/config.schema.json, router/pkg/config/testdata/*, router/pkg/config/router_config.go
Add InMemorySwitchoverFallback bool to cache_warmup config, update schema and testdata, expose flag in Usage().
Proto / registration
proto/wg/cosmo/node/v1/node.proto, connect/src/wg/cosmo/node/v1/node_pb.ts, controlplane/src/core/bufservices/NodeService.ts
Add is_cache_warmer_enabled / isCacheWarmerEnabled boolean to RegistrationInfo and populate it from node features in NodeService.
Module deps
router/go.mod, router-tests/go.mod
Bump github.com/dgraph-io/ristretto/v2 (v2.1.0 -> v2.4.0).
Minor / formatting
router/core/cache_warmup_cdn.go
Import reorder only (no behavioral change).

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • feat: circuit breaker implementation #1929: Modifies graph mux/handler construction and BuildGraphMuxOptions signature—closely related to threading additional runtime options (switchover/cache-warmer) through the graph server.
🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 22.73% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: rewarm cache from planner cache on schema change' accurately and concisely describes the main feature being added: a mechanism to reuse cached data from the planner when the router restarts due to schema changes.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link

github-actions bot commented Jan 9, 2026

Router image scan passed

✅ No security vulnerabilities found in image:

ghcr.io/wundergraph/cosmo/router:sha-f1d790522817170bcb0550d1c5df16cc32691d4c

@codecov
Copy link

codecov bot commented Jan 9, 2026

Codecov Report

❌ Patch coverage is 80.25478% with 31 lines in your changes missing coverage. Please review.
✅ Project coverage is 36.06%. Comparing base (c5071f8) to head (fd68357).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
router/core/graph_server.go 68.88% 13 Missing and 1 partial ⚠️
router/core/restart_switchover_config.go 91.13% 4 Missing and 3 partials ⚠️
router/core/router.go 62.50% 3 Missing ⚠️
router/core/supervisor_instance.go 0.00% 3 Missing ⚠️
router/cmd/main.go 0.00% 2 Missing ⚠️
controlplane/src/core/bufservices/NodeService.ts 0.00% 1 Missing ⚠️
router/core/cache_warmup_plans.go 85.71% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##            main    #2445       +/-   ##
==========================================
+ Coverage   1.49%   36.06%   +34.57%     
==========================================
  Files        292      946      +654     
  Lines      46960   123527    +76567     
  Branches     431     4944     +4513     
==========================================
+ Hits         703    44553    +43850     
- Misses     45974    77409    +31435     
- Partials     283     1565     +1282     
Files with missing lines Coverage Δ
router/core/cache_warmup_cdn.go 51.42% <ø> (ø)
router/core/operation_planner.go 68.00% <100.00%> (ø)
router/core/router_config.go 93.71% <100.00%> (ø)
router/core/supervisor.go 60.97% <100.00%> (ø)
router/gen/proto/wg/cosmo/node/v1/node.pb.go 20.74% <ø> (ø)
router/pkg/config/config.go 80.51% <ø> (ø)
controlplane/src/core/bufservices/NodeService.ts 20.68% <0.00%> (ø)
router/core/cache_warmup_plans.go 85.71% <85.71%> (ø)
router/cmd/main.go 0.00% <0.00%> (ø)
router/core/router.go 71.06% <62.50%> (ø)
... and 3 more

... and 641 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@SkArchon SkArchon marked this pull request as ready for review January 14, 2026 16:15
@SkArchon SkArchon marked this pull request as draft January 14, 2026 16:15
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
router/go.mod (2)

175-199: Fix inconsistent ristretto/v2 fork across multi-module repo

The router module pins a fork (github.com/wundergraph/ristretto/v2 v2.0.0-20260113151003-f7231509ad4d), but router-tests, demo, and graphqlmetrics modules all require the upstream version (v2.1.0). This creates mixed implementations at build time. The replace directive in router/go.mod only affects that module's dependencies, not the workspace. Either apply the same replace directive to other go.mod files that depend on ristretto, or unify the version across all modules.


59-88: Add explanatory comment about the Ristretto fork in go.mod

The fork github.com/wundergraph/ristretto/v2 at commit f7231509ad4d (branch milinda/iterating_elements) adds iterator support for cache elements, which is not available in the upstream v2.0.0 release. Add a comment in the replace directive explaining this purpose. Since no other modules in the repo depend on ristretto v2.1.x features (the require block specifies only v2.0.0), there are no compatibility concerns with using the fork.

🤖 Fix all issues with AI agents
In `@router-tests/cache_warmup_test.go`:
- Around line 1100-1106: The test uses require.EventuallyWithT with an
assert.CollectT callback but calls require.Equal inside the retry closure (in
the block passed to require.EventuallyWithT) which will abort immediately;
replace the failing assertions inside that callback with assert.Equal so the
checks are retried: locate the closure passed to require.EventuallyWithT (where
res = xEnv.MakeGraphQLRequestOK(...)) and change require.Equal(t, "updated",
res.Response.Header.Get("X-Router-Config-Version")) and require.Equal(t, "HIT",
res.Response.Header.Get("x-wg-execution-plan-cache")) to assert.Equal(t,
"updated", ...) and assert.Equal(t, "HIT", ...).

In `@router/core/http_server.go`:
- Line 39: Remove the dead field from the httpServerOptions struct:
`switchoverConfig` is never set when creating `httpServerOptions` and is not
used by `newServer()`, since `SwitchoverConfig` is handled at the `Router` level
and passed into `newGraphServer`; delete the `switchoverConfig
*SwitchoverConfig` field from the `httpServerOptions` definition and update any
references (none expected) to rely on the Router/newGraphServer flow instead.

In `@router/core/restart_switchover_config.go`:
- Around line 24-38: UpdateSwitchoverConfig writes
s.inMemorySwitchOverCache.enabled without synchronization which races with reads
in getPlanCacheForFF, setPlanCacheForFF, and cleanupUnusedFeatureFlags; fix by
acquiring the SwitchoverConfig write lock (e.g., s.mu or the existing cache
lock) around the block that sets s.inMemorySwitchOverCache.enabled and mutates
queriesForFeatureFlag/testValue, and change the reader methods
getPlanCacheForFF, setPlanCacheForFF, and cleanupUnusedFeatureFlags to check
enabled while holding the same lock (or alternatively convert enabled to an
atomic.Bool and use atomic load/store for reads/writes) so readers and writers
are consistent.

In `@router/core/router.go`:
- Around line 1177-1181: Fix the typo in the inline comment near the switchover
config initialization: change "sueprvisor" to "supervisor" in the comment that
precedes the r.switchoverConfig = NewSwitchoverConfig() block so the comment
reads correctly while leaving the surrounding logic (r.switchoverConfig,
NewSwitchoverConfig(), UpdateSwitchoverConfig(&r.Config)) unchanged.
🧹 Nitpick comments (4)
router/pkg/config/config.go (1)

974-986: Consider making CacheWarmupSource.InMemorySwitchover a pointer to preserve “unset vs disabled”
At Line 974-977, InMemorySwitchover is a non-pointer struct with omitempty. Depending on the marshaler, this may still serialize as present (and it definitely can’t represent “not provided” distinctly from “provided but disabled”), which tends to leak into golden testdata and example configs (as seen in config_defaults.json).

If you want cache_warmup.source to be truly optional/empty unless explicitly configured, consider changing it to a pointer like *CacheWarmupInMemorySwitchover.

Proposed change
 type CacheWarmupSource struct {
 	Filesystem         *CacheWarmupFileSystemSource  `yaml:"filesystem,omitempty"`
-	InMemorySwitchover CacheWarmupInMemorySwitchover `yaml:"in_memory_switchover,omitempty"`
+	InMemorySwitchover *CacheWarmupInMemorySwitchover `yaml:"in_memory_switchover,omitempty"`
 }
router/pkg/config/config.schema.json (1)

2387-2423: Schema polish: add default: false + tighten wording for in_memory_switchover.enabled
For the new block at Line 2404-2413:

  • enabled has no default in the schema (even though the Go config uses envDefault:"false"). Consider adding "default": false for consistency with other booleans.
  • Minor: description “Is in memory switchover enabled” could be normalized (and maybe mention it reuses prior process’ cached plans, if that’s the intent).
router/core/cache_warmup_plans.go (1)

16-30: Consider preserving additional operation metadata for complete cache warmup.

The current implementation only extracts the Query content from cached plans. For accurate cache warmup, operations may also need variables, operation name, and headers that were part of the original request. If planWithMetaData stores additional context, consider extracting it here.

Also, based on the past discussion about limiting cache warmer operations to ~100, this implementation iterates unboundedly. Consider adding a limit if the plan cache can grow large.

router/core/restart_switchover_config.go (1)

44-49: Consider documenting or removing the testValue field.

The testValue field appears to be for testing purposes but is not documented. If it's only used in tests, consider adding a comment explaining its purpose, or moving test-specific logic elsewhere.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 08bcfb5 and d576e03.

⛔ Files ignored due to path filters (1)
  • router/go.sum is excluded by !**/*.sum
📒 Files selected for processing (16)
  • router-tests/cache_warmup_test.go
  • router/cmd/main.go
  • router/core/cache_warmup_plans.go
  • router/core/graph_server.go
  • router/core/http_server.go
  • router/core/operation_planner.go
  • router/core/restart_switchover_config.go
  • router/core/router.go
  • router/core/router_config.go
  • router/core/supervisor.go
  • router/core/supervisor_instance.go
  • router/go.mod
  • router/pkg/config/config.go
  • router/pkg/config/config.schema.json
  • router/pkg/config/testdata/config_defaults.json
  • router/pkg/config/testdata/config_full.json
🧰 Additional context used
🧠 Learnings (10)
📓 Common learnings
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2172
File: router/core/graph_server.go:0-0
Timestamp: 2025-09-17T20:55:39.456Z
Learning: The Initialize method in router/internal/retrytransport/manager.go has been updated to properly handle feature-flag-only subgraphs by collecting subgraphs from both routerConfig.GetSubgraphs() and routerConfig.FeatureFlagConfigs.ConfigByFeatureFlagName, ensuring all subgraphs receive retry configuration.
📚 Learning: 2025-09-17T20:55:39.456Z
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2172
File: router/core/graph_server.go:0-0
Timestamp: 2025-09-17T20:55:39.456Z
Learning: The Initialize method in router/internal/retrytransport/manager.go has been updated to properly handle feature-flag-only subgraphs by collecting subgraphs from both routerConfig.GetSubgraphs() and routerConfig.FeatureFlagConfigs.ConfigByFeatureFlagName, ensuring all subgraphs receive retry configuration.

Applied to files:

  • router/core/supervisor.go
  • router/cmd/main.go
  • router/core/router.go
  • router-tests/cache_warmup_test.go
  • router/core/restart_switchover_config.go
  • router/core/supervisor_instance.go
  • router/core/graph_server.go
📚 Learning: 2026-01-06T12:37:21.521Z
Learnt from: asoorm
Repo: wundergraph/cosmo PR: 2379
File: router/pkg/connectrpc/operation_registry_test.go:381-399
Timestamp: 2026-01-06T12:37:21.521Z
Learning: In Go code (Go 1.25+), prefer using sync.WaitGroup.Go(func()) to run a function in a new goroutine, letting the WaitGroup manage Add/Done automatically. Avoid manual wg.Add(1) followed by go func() { defer wg.Done(); ... }() patterns. Apply this guidance across all Go files in the wundergraph/cosmo repository where concurrency is used.

Applied to files:

  • router/core/supervisor.go
  • router/core/http_server.go
  • router/core/router_config.go
  • router/core/cache_warmup_plans.go
  • router/pkg/config/config.go
  • router/cmd/main.go
  • router/core/router.go
  • router-tests/cache_warmup_test.go
  • router/core/restart_switchover_config.go
  • router/core/supervisor_instance.go
  • router/core/operation_planner.go
  • router/core/graph_server.go
📚 Learning: 2025-08-12T13:50:45.964Z
Learnt from: Noroth
Repo: wundergraph/cosmo PR: 2132
File: router-plugin/plugin.go:139-146
Timestamp: 2025-08-12T13:50:45.964Z
Learning: In the Cosmo router plugin system, the plugin framework creates its own logger independently. When creating a logger in NewRouterPlugin, it's only needed for the gRPC server setup (passed via setup.GrpcServerInitOpts.Logger) and doesn't need to be assigned back to serveConfig.Logger.

Applied to files:

  • router/cmd/main.go
📚 Learning: 2025-10-01T20:39:16.113Z
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2252
File: router-tests/telemetry/telemetry_test.go:9684-9693
Timestamp: 2025-10-01T20:39:16.113Z
Learning: Repo preference: In router-tests/telemetry/telemetry_test.go, keep strict > 0 assertions for request.operation.*Time (parsingTime, normalizationTime, validationTime, planningTime) in telemetry-related tests; do not relax to >= 0 unless CI flakiness is observed.

Applied to files:

  • router-tests/cache_warmup_test.go
📚 Learning: 2025-08-20T22:13:25.222Z
Learnt from: StarpTech
Repo: wundergraph/cosmo PR: 2157
File: router-tests/go.mod:16-16
Timestamp: 2025-08-20T22:13:25.222Z
Learning: github.com/mark3labs/mcp-go v0.38.0 has regressions and should not be used in the wundergraph/cosmo project. v0.36.0 is the stable version that should be used across router-tests and other modules.

Applied to files:

  • router-tests/cache_warmup_test.go
  • router/go.mod
📚 Learning: 2025-08-26T17:19:28.178Z
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2167
File: router/internal/expr/retry_context.go:3-10
Timestamp: 2025-08-26T17:19:28.178Z
Learning: In the Cosmo router codebase, prefer using netErr.Timeout() checks over explicit context.DeadlineExceeded checks for timeout detection in retry logic, as Go's networking stack typically wraps context deadline exceeded errors in proper net.Error implementations.

Applied to files:

  • router-tests/cache_warmup_test.go
📚 Learning: 2025-08-28T09:18:10.121Z
Learnt from: endigma
Repo: wundergraph/cosmo PR: 2141
File: router-tests/http_subscriptions_test.go:100-108
Timestamp: 2025-08-28T09:18:10.121Z
Learning: In router-tests/http_subscriptions_test.go heartbeat tests, the message ordering should remain strict with data messages followed by heartbeat messages, as the timing is deterministic and known by design in the Cosmo router implementation.

Applied to files:

  • router-tests/cache_warmup_test.go
📚 Learning: 2025-08-20T10:08:17.857Z
Learnt from: endigma
Repo: wundergraph/cosmo PR: 2155
File: router/core/router.go:1857-1866
Timestamp: 2025-08-20T10:08:17.857Z
Learning: router/pkg/config/config.schema.json forbids null values for traffic_shaping.subgraphs: additionalProperties references $defs.traffic_shaping_subgraph_request_rule with type "object". Therefore, in core.NewSubgraphTransportOptions, dereferencing each subgraph rule pointer is safe under schema-validated configs, and a nil-check is unnecessary.

Applied to files:

  • router/core/graph_server.go
📚 Learning: 2025-11-12T18:38:46.619Z
Learnt from: StarpTech
Repo: wundergraph/cosmo PR: 2327
File: graphqlmetrics/core/metrics_service.go:435-442
Timestamp: 2025-11-12T18:38:46.619Z
Learning: In graphqlmetrics/core/metrics_service.go, the opGuardCache TTL (30 days) is intentionally shorter than the gql_metrics_operations database retention period (90 days). This design reduces memory usage while relying on database constraints to prevent duplicate inserts after cache expiration.

Applied to files:

  • router/core/graph_server.go
🧬 Code graph analysis (6)
router/core/supervisor.go (2)
router/core/restart_switchover_config.go (1)
  • SwitchoverConfig (14-16)
router/core/router.go (1)
  • Router (84-94)
router/core/http_server.go (1)
router/core/restart_switchover_config.go (1)
  • SwitchoverConfig (14-16)
router/cmd/main.go (1)
router/core/restart_switchover_config.go (2)
  • SwitchoverConfig (14-16)
  • NewSwitchoverConfig (18-22)
router/core/router.go (1)
router/core/restart_switchover_config.go (2)
  • SwitchoverConfig (14-16)
  • NewSwitchoverConfig (18-22)
router/core/supervisor_instance.go (3)
router/core/restart_switchover_config.go (1)
  • SwitchoverConfig (14-16)
router/core/router_config.go (1)
  • Config (49-144)
router/core/router.go (2)
  • Option (174-174)
  • WithSwitchoverConfig (2057-2061)
router/core/graph_server.go (4)
router/core/restart_switchover_config.go (1)
  • SwitchoverConfig (14-16)
router/core/router.go (1)
  • Router (84-94)
router/core/operation_planner.go (1)
  • NewOperationPlanner (50-57)
router/core/cache_warmup_plans.go (1)
  • NewPlanSource (16-30)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
  • GitHub Check: build-router
  • GitHub Check: integration_test (./telemetry)
  • GitHub Check: build_push_image (nonroot)
  • GitHub Check: build_push_image
  • GitHub Check: integration_test (./events)
  • GitHub Check: build_test
  • GitHub Check: image_scan (nonroot)
  • GitHub Check: image_scan
  • GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)
  • GitHub Check: build_test
  • GitHub Check: Analyze (go)
  • GitHub Check: Analyze (javascript-typescript)
🔇 Additional comments (36)
router/cmd/main.go (1)

152-172: SwitchoverConfig wiring looks correct; ensure lifecycle matches intended cache persistence scope
Line 152-155 passes a single core.NewSwitchoverConfig() into the supervisor—so caches/state will live for the process lifetime across reloads (likely desired for “reprocess existing plans”). Just ensure this is the intended scope (vs “per reload”) and that the underlying cache is concurrency-safe.

router/pkg/config/testdata/config_defaults.json (1)

311-322: config_defaults.json is a golden test output file, not a configuration subject to schema validation—no changes needed

The concern about schema mismatch is based on a misunderstanding of the file's purpose. config_defaults.json is purely a golden test artifact in TestDefaults() used to verify the JSON serialization of the Config struct; it is never loaded from disk or validated against config.schema.json.

The schema validation in LoadConfig() applies only to YAML input files. When the Config struct is marshaled to JSON for the golden test, Go's default JSON marshaling uses PascalCase field names (Filesystem, InMemorySwitchover) and includes null values where structs have omitempty YAML tags. This serialization format is independent of the schema, which constrains YAML input with snake_case keys. The presence of "Filesystem": null in the output is expected and correct.

router/pkg/config/testdata/config_full.json (1)

660-671: LGTM!

The InMemorySwitchover configuration block is correctly added under CacheWarmup.Source, following the existing pattern with Filesystem. The default Enabled: false is appropriate for this comprehensive config test file.

router/core/supervisor.go (3)

32-36: LGTM!

The SwitchoverConfig field is correctly added to RouterResources, enabling the switchover configuration to be passed through to downstream components like the Router.


38-44: LGTM!

The SwitchoverConfig field in RouterSupervisorOpts provides the input path for the switchover configuration.


52-55: LGTM!

The initialization correctly propagates SwitchoverConfig from options to resources, completing the wiring through the supervisor layer.

router-tests/cache_warmup_test.go (5)

5-6: LGTM!

The new imports are appropriate for the added test functionality: assert for EventuallyWithT callbacks and configpoller for the mock configuration poller.


919-971: LGTM - well-structured test for in-memory switchover caching.

The test correctly validates that:

  1. Initial request results in a cache MISS
  2. After config update (simulating switchover), the same query results in a cache HIT

This confirms the in-memory plan cache is preserved across config reloads.


973-1014: LGTM!

Correctly tests that cache is not preserved when cache warmup is entirely disabled, resulting in MISS on both requests.


1016-1062: LGTM!

Correctly tests that the default cache warmer (with InMemorySwitchover.Enabled = false) does not preserve plans across config reloads.


1064-1108: Both writeTestConfig (defined in router-tests/config_hot_reload_test.go:579) and ConfigPollerMock (defined in router-tests/config_hot_reload_test.go:32) are properly defined in the test package and are accessible to this test file.

router/core/router_config.go (1)

314-321: LGTM - source classification logic is correct and type-safe.

The addition correctly classifies the cache warmup source. The control flow properly prioritizes Filesystem over InMemorySwitchover. Since InMemorySwitchover is a value type (not a pointer), accessing its Enabled field is safe—no nil check is needed.

router/core/cache_warmup_plans.go (1)

32-34: LGTM!

Simple and efficient - queries are extracted once during construction, allowing the source cache to be garbage collected.

router/core/supervisor_instance.go (3)

53-53: LGTM!

Correctly threads SwitchoverConfig through the router option construction.


184-184: LGTM!

Function signature updated to accept switchoverConfig parameter for propagation.


275-275: LGTM!

WithSwitchoverConfig option correctly added to the options list.

router/core/router.go (4)

93-93: LGTM!

Field addition to hold switchover configuration state.


600-600: LGTM!

Correctly passes switchoverConfig to newGraphServer for plan cache management.


608-610: LGTM!

Feature flag cleanup after server swap ensures unused caches are pruned.


2057-2061: LGTM!

Standard option pattern for setting switchover configuration.

router/core/restart_switchover_config.go (2)

71-78: Clarify behavior when FeatureFlagConfigs is nil.

When routerCfg.FeatureFlagConfigs is nil, the method returns early without cleaning up any cached feature flags. This means stale feature flag caches could persist. Is this intentional?

If feature flags can be completely removed from a config, consider clearing all non-base caches when FeatureFlagConfigs is nil.


83-96: LGTM!

The cleanup logic correctly preserves the base configuration (empty string key) while removing caches for feature flags that no longer exist in the router config.

router/core/operation_planner.go (8)

24-24: LGTM!

New content field enables storing the original query for cache warmup scenarios.


32-37: LGTM!

The storeContent flag and operationPlannerOpts struct provide clean control over content storage behavior.


50-56: LGTM!

Constructor correctly accepts and stores the storeContent flag.


59-59: LGTM!

Signature update to accept operationPlannerOpts for controlling plan preparation behavior.


93-95: LGTM!

Content is conditionally stored based on the storeContent option, avoiding unnecessary memory usage when content storage is not needed.


122-122: LGTM!

When skipping cache (for tracing or query plan inclusion), content storage is disabled since these plans are not cached.


150-150: LGTM!

Normal cache path correctly uses the planner's storeContent setting.


44-46: The Iter method is already implemented by ristretto.Cache, which is the only ExecutionPlanCache implementation used in the codebase. It's already being called in router/core/cache_warmup_plans.go:21 without issues. Adding Iter to the interface is not a breaking change.

Likely an incorrect or invalid review comment.

router/core/graph_server.go (6)

108-116: LGTM!

The addition of SwitchoverConfig to BuildGraphMuxOptions follows the existing pattern for optional configuration parameters.


122-123: LGTM!

The signature extension to accept switchoverConfig is correctly propagated through the initialization path.


434-458: LGTM!

The switchoverConfig is correctly threaded through to feature flag mux construction, ensuring consistent behavior across base and feature flag graphs.


526-532: LGTM!

Field alignment changes are cosmetic and improve readability.


1303-1304: No nil pointer dereference risk. SwitchoverConfig and inMemorySwitchOverCache are guaranteed non-nil by initialization.

The opts.SwitchoverConfig parameter passed to newGraphServer is guaranteed to be non-nil with a valid inMemorySwitchOverCache because:

  1. Before newServer() calls newGraphServer() at line 600, the Router's Start() method initializes r.switchoverConfig with a nil check at lines 1177-1179.
  2. NewSwitchoverConfig() always returns a SwitchoverConfig with an initialized inMemorySwitchOverCache field (never nil).

Accessing opts.SwitchoverConfig.inMemorySwitchOverCache.enabled is safe; no defensive nil checks are needed.

Likely an incorrect or invalid review comment.


1358-1362: Thread safety is already properly implemented; remove this concern.

The getPlanCacheForFF and setPlanCacheForFF methods properly use sync.RWMutex (RLock/RUnlock for reads, Lock/Unlock for writes with defers) to synchronize access to the shared cache map. The nil safety concern about opts.SwitchoverConfig is not an issue—line 1304 in the same function already accesses opts.SwitchoverConfig.inMemorySwitchOverCache.enabled without guards, meaning the parameter is guaranteed to be non-nil by design.

Likely an incorrect or invalid review comment.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

@SkArchon SkArchon marked this pull request as ready for review January 14, 2026 16:49
@SkArchon SkArchon marked this pull request as draft January 14, 2026 16:49
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@router-tests/go.mod`:
- Line 207: The go.mod contains a replace directive pointing to the non-public
fork "github.com/wundergraph/ristretto/v2" which breaks module resolution;
remove that replace and switch to the upstream module
"github.com/dgraph-io/ristretto/v2" (or make the fork public if you must keep
it). Concretely, delete the replace entry that maps
"github.com/wundergraph/ristretto/v2 => github.com/wundergraph/ristretto/v2
v2.0.0-..." and ensure the project requires "github.com/dgraph-io/ristretto/v2"
(update to v2.1.0 if needed), then run "go mod tidy" to refresh the dependency
graph and verify builds/CI pass.
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d576e03 and 78fc67d.

⛔ Files ignored due to path filters (1)
  • router-tests/go.sum is excluded by !**/*.sum
📒 Files selected for processing (5)
  • router-tests/cache_warmup_test.go
  • router-tests/go.mod
  • router/core/restart_switchover_config.go
  • router/core/restart_switchover_config_test.go
  • router/core/router.go
🧰 Additional context used
🧠 Learnings (10)
📚 Learning: 2025-08-20T22:13:25.222Z
Learnt from: StarpTech
Repo: wundergraph/cosmo PR: 2157
File: router-tests/go.mod:16-16
Timestamp: 2025-08-20T22:13:25.222Z
Learning: github.com/mark3labs/mcp-go v0.38.0 has regressions and should not be used in the wundergraph/cosmo project. v0.36.0 is the stable version that should be used across router-tests and other modules.

Applied to files:

  • router-tests/go.mod
  • router-tests/cache_warmup_test.go
📚 Learning: 2025-09-24T12:54:00.765Z
Learnt from: endigma
Repo: wundergraph/cosmo PR: 2222
File: router-tests/websocket_test.go:2238-2302
Timestamp: 2025-09-24T12:54:00.765Z
Learning: The wundergraph/cosmo project uses Go 1.25 (Go 1.23+ minimum), so fmt.Appendf and other newer Go standard library functions are available and can be used without compatibility concerns.

Applied to files:

  • router-tests/go.mod
📚 Learning: 2025-09-24T12:54:00.765Z
Learnt from: endigma
Repo: wundergraph/cosmo PR: 2222
File: router-tests/websocket_test.go:2238-2302
Timestamp: 2025-09-24T12:54:00.765Z
Learning: The wundergraph/cosmo project uses Go 1.25 (Go 1.25 minimum), so fmt.Appendf and other newer Go standard library functions are available and can be used without compatibility concerns.

Applied to files:

  • router-tests/go.mod
📚 Learning: 2025-09-17T20:55:39.456Z
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2172
File: router/core/graph_server.go:0-0
Timestamp: 2025-09-17T20:55:39.456Z
Learning: The Initialize method in router/internal/retrytransport/manager.go has been updated to properly handle feature-flag-only subgraphs by collecting subgraphs from both routerConfig.GetSubgraphs() and routerConfig.FeatureFlagConfigs.ConfigByFeatureFlagName, ensuring all subgraphs receive retry configuration.

Applied to files:

  • router/core/restart_switchover_config_test.go
  • router/core/restart_switchover_config.go
  • router-tests/cache_warmup_test.go
  • router/core/router.go
📚 Learning: 2025-10-01T20:39:16.113Z
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2252
File: router-tests/telemetry/telemetry_test.go:9684-9693
Timestamp: 2025-10-01T20:39:16.113Z
Learning: Repo preference: In router-tests/telemetry/telemetry_test.go, keep strict > 0 assertions for request.operation.*Time (parsingTime, normalizationTime, validationTime, planningTime) in telemetry-related tests; do not relax to >= 0 unless CI flakiness is observed.

Applied to files:

  • router/core/restart_switchover_config_test.go
  • router-tests/cache_warmup_test.go
📚 Learning: 2026-01-06T12:37:21.521Z
Learnt from: asoorm
Repo: wundergraph/cosmo PR: 2379
File: router/pkg/connectrpc/operation_registry_test.go:381-399
Timestamp: 2026-01-06T12:37:21.521Z
Learning: In Go code (Go 1.25+), prefer using sync.WaitGroup.Go(func()) to run a function in a new goroutine, letting the WaitGroup manage Add/Done automatically. Avoid manual wg.Add(1) followed by go func() { defer wg.Done(); ... }() patterns. Apply this guidance across all Go files in the wundergraph/cosmo repository where concurrency is used.

Applied to files:

  • router/core/restart_switchover_config_test.go
  • router/core/restart_switchover_config.go
  • router-tests/cache_warmup_test.go
  • router/core/router.go
📚 Learning: 2025-07-29T08:19:55.720Z
Learnt from: Noroth
Repo: wundergraph/cosmo PR: 2088
File: demo/pkg/subgraphs/projects/src/main_test.go:0-0
Timestamp: 2025-07-29T08:19:55.720Z
Learning: In Go testing, t.Fatal, t.FailNow, t.Skip* and similar methods cannot be called from goroutines other than the main test goroutine, as they will cause a runtime panic. Use channels, t.Error*, or other synchronization mechanisms to communicate errors from goroutines back to the main test thread.

Applied to files:

  • router-tests/cache_warmup_test.go
📚 Learning: 2025-08-28T09:18:10.121Z
Learnt from: endigma
Repo: wundergraph/cosmo PR: 2141
File: router-tests/http_subscriptions_test.go:100-108
Timestamp: 2025-08-28T09:18:10.121Z
Learning: In router-tests/http_subscriptions_test.go heartbeat tests, the message ordering should remain strict with data messages followed by heartbeat messages, as the timing is deterministic and known by design in the Cosmo router implementation.

Applied to files:

  • router-tests/cache_warmup_test.go
📚 Learning: 2025-08-26T17:19:28.178Z
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2167
File: router/internal/expr/retry_context.go:3-10
Timestamp: 2025-08-26T17:19:28.178Z
Learning: In the Cosmo router codebase, prefer using netErr.Timeout() checks over explicit context.DeadlineExceeded checks for timeout detection in retry logic, as Go's networking stack typically wraps context deadline exceeded errors in proper net.Error implementations.

Applied to files:

  • router-tests/cache_warmup_test.go
📚 Learning: 2025-09-29T10:28:07.122Z
Learnt from: dkorittki
Repo: wundergraph/cosmo PR: 2192
File: router/pkg/config/config.go:1028-1029
Timestamp: 2025-09-29T10:28:07.122Z
Learning: The deprecation strategy for IntrospectionEnabled field in router/pkg/config/config.go is to first mark it as deprecated, then migrate all call sites to use IntrospectionConfig.Enabled, and finally remove the deprecated field. The envDefault tag should be kept during the deprecation period for backward compatibility.

Applied to files:

  • router/core/router.go
🧬 Code graph analysis (3)
router/core/restart_switchover_config_test.go (2)
router/core/restart_switchover_config.go (2)
  • InMemorySwitchOverCache (32-36)
  • NewSwitchoverConfig (18-22)
router/pkg/config/config.go (2)
  • CacheWarmupConfiguration (989-995)
  • CacheWarmupInMemorySwitchover (983-985)
router-tests/cache_warmup_test.go (5)
router-tests/testenv/testenv.go (4)
  • Run (105-122)
  • Config (292-349)
  • RouterConfig (246-249)
  • Environment (1735-1771)
router/pkg/config/config.go (5)
  • Config (1037-1112)
  • CacheWarmupConfiguration (989-995)
  • CacheWarmupSource (974-977)
  • CacheWarmupInMemorySwitchover (983-985)
  • ExecutionConfig (865-869)
router/core/router.go (3)
  • Option (174-174)
  • WithCacheWarmupConfig (2114-2118)
  • ExecutionConfig (159-163)
router/core/cache_warmup.go (1)
  • CacheWarmupSource (26-28)
router/pkg/controlplane/configpoller/config_poller.go (1)
  • ConfigPoller (20-29)
router/core/router.go (1)
router/core/restart_switchover_config.go (2)
  • SwitchoverConfig (14-16)
  • NewSwitchoverConfig (18-22)
🔇 Additional comments (9)
router/core/restart_switchover_config_test.go (1)

1-337: Test coverage looks comprehensive.

The test file provides thorough coverage of the SwitchoverConfig and InMemorySwitchOverCache functionality, including:

  • Enable/disable state transitions
  • Edge cases (nil config, already enabled/disabled)
  • Feature flag cache operations (get/set)
  • Cleanup of unused feature flags
  • Delegation from SwitchoverConfig to InMemorySwitchOverCache
router/core/restart_switchover_config.go (1)

11-22: LGTM for structure and initialization.

The type definitions and constructor are well-structured. The planCache type alias improves readability, and NewSwitchoverConfig correctly initializes with a disabled cache state.

router/core/router.go (4)

93-93: LGTM for field addition.

The switchoverConfig field is properly added to the Router struct to maintain state across config changes.


599-611: Clean integration of switchover config in server creation.

The switchover config is correctly passed to newGraphServer and CleanupFeatureFlags is appropriately called after the server swap to prune stale feature flag caches.


1177-1181: LGTM for initialization logic.

The lazy initialization of switchoverConfig for test scenarios is appropriate, and the comment explaining this is helpful (and the typo from the previous review has been fixed).


2057-2061: LGTM for new option function.

The WithSwitchoverConfig option follows the established pattern for router configuration options.

router-tests/cache_warmup_test.go (3)

1100-1106: Correct usage of assert inside EventuallyWithT callback.

Using assert.Equal instead of require.Equal inside the EventuallyWithT callback is correct—this allows the retry mechanism to work properly by collecting failures rather than immediately aborting.


1064-1108: No action needed. The writeTestConfig helper function is defined in router-tests/config_hot_reload_test.go at line 579 and is accessible from the cache_warmup_test.go file since both are in the same package. The function is properly defined and available for use.


919-971: Code is correct as-is—no undefined types.

ConfigPollerMock and writeTestConfig are defined in config_hot_reload_test.go in the same package (integration), making them accessible throughout cache_warmup_test.go. The test implementation is valid.

Likely an incorrect or invalid review comment.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
router/pkg/config/config.schema.json (1)

2387-2422: Allow empty cache_warmup.source or update the description.

The description says an empty source falls back to CDN, but the oneOf currently rejects {}. This makes configs with source: {} invalid despite the documented behavior.

🔧 Suggested schema tweak
         "oneOf": [
           {
             "required": ["filesystem"]
           },
+          {
+            "required": ["in_memory_switchover"]
+          },
+          {
+            "maxProperties": 0
+          }
-          {
-            "required": ["in_memory_switchover"]
-          }
         ]
🤖 Fix all issues with AI agents
In `@router/core/restart_switchover_config.go`:
- Around line 113-126: In InMemorySwitchOverCache.processOnConfigChangeRestart,
avoid the pre-lock enabled check (data race) and the unchecked assertion
data.(planCache) which panics on repeated calls: acquire c.mu.Lock() first,
check c.enabled while holding the lock, then iterate c.queriesForFeatureFlag and
use a type switch (or type check) to handle both planCache and
[]*nodev1.Operation values—call convertToNodeOperation only when the value is
planCache, leave already-converted []*nodev1.Operation as-is (or copy them) and
assign into the new switchoverMap; this prevents panics on repeated invocations
and removes the race on enabled.

In `@router/core/router_config.go`:
- Around line 315-320: The code currently sets usage["cache_warmup_source"] to
"cdn" when InMemorySwitchoverFallback is false even if an explicit in-memory
source block exists; update the branching in the cache warmup usage logic to
inspect the configured source block directly (e.g., check
c.cacheWarmup.Source.InMemorySwitchover != nil or the appropriate
Source.InMemorySwitchover field) before falling back to CDN, ensuring you detect
and set "in_memory_switchover" when the in-memory source is present, still
preferring "filesystem" when c.cacheWarmup.Source.Filesystem != nil and using
InMemorySwitchoverFallback only as the fallback indicator.
♻️ Duplicate comments (3)
router/core/restart_switchover_config.go (3)

70-73: Data race on enabled flag — check should be moved inside the lock.

The enabled check at line 71 occurs before acquiring the read lock at line 75. Since enabled is written under the lock in updateStateFromConfig, this creates a TOCTOU race if the config is updated concurrently.

Proposed fix
 func (c *InMemorySwitchOverCache) getPlanCacheForFF(featureFlagKey string) []*nodev1.Operation {
-	if !c.enabled {
-		return nil
-	}
-
 	c.mu.RLock()
 	defer c.mu.RUnlock()
+	if !c.enabled {
+		return nil
+	}
 
 	switch cache := c.queriesForFeatureFlag[featureFlagKey].(type) {

93-111: Same data race pattern in setPlanCacheForFF and deletePlanCacheForFF.

Both methods check enabled before acquiring the lock. Apply the same fix as for getPlanCacheForFF.

Proposed fix
 func (c *InMemorySwitchOverCache) setPlanCacheForFF(featureFlagKey string, cache planCache) {
-	if !c.enabled || cache == nil {
+	if cache == nil {
 		return
 	}
 
 	c.mu.Lock()
 	defer c.mu.Unlock()
+	if !c.enabled {
+		return
+	}
 	c.queriesForFeatureFlag[featureFlagKey] = cache
 }
 
 func (c *InMemorySwitchOverCache) deletePlanCacheForFF(featureFlagKey string) {
-	if !c.enabled {
-		return
-	}
-
 	c.mu.Lock()
 	defer c.mu.Unlock()
+	if !c.enabled {
+		return
+	}
 	delete(c.queriesForFeatureFlag, featureFlagKey)
 }

128-154: Same data race pattern in cleanupUnusedFeatureFlags.

The enabled check at line 129 occurs before acquiring the lock at line 137.

Proposed fix
 func (c *InMemorySwitchOverCache) cleanupUnusedFeatureFlags(routerCfg *nodev1.RouterConfig) {
-	if !c.enabled {
-		return
-	}
-
 	if routerCfg.FeatureFlagConfigs == nil {
 		return
 	}
 
 	c.mu.Lock()
 	defer c.mu.Unlock()
+	if !c.enabled {
+		return
+	}
 
 	ffNames := make(map[string]struct{})
🧹 Nitpick comments (3)
router-tests/cache_warmup_test.go (1)

956-958: Avoid indefinite hangs waiting on ConfigPollerMock.ready.

Blocking on <-pm.ready without a timeout can hang CI if the poller never signals. A small helper with a timeout keeps tests fail-fast and debuggable.

✅ Safer wait helper
 func TestInMemorySwitchoverCaching(t *testing.T) {
 	t.Parallel()
+
+	waitForPollerReady := func(t *testing.T, ch <-chan struct{}) {
+		t.Helper()
+		select {
+		case <-ch:
+		case <-time.After(2 * time.Second):
+			t.Fatal("config poller did not become ready in time")
+		}
+	}
 
 	t.Run("Verify the plan is cached on config restart when in memory switchover is enabled", func(t *testing.T) {
 		t.Parallel()
@@
-			<-pm.ready
+			waitForPollerReady(t, pm.ready)
@@
-			<-pm.ready
+			waitForPollerReady(t, pm.ready)
@@
-			<-pm.ready
+			waitForPollerReady(t, pm.ready)

Also applies to: 1001-1003, 1045-1047

router/core/restart_switchover_config.go (1)

156-165: Consider nil-check for safety.

convertToNodeOperation doesn't check if data is nil before calling IterValues. While callers should ensure non-nil values, a defensive nil-check would prevent panics from unexpected states.

Suggested improvement
 func convertToNodeOperation(data planCache) []*nodev1.Operation {
+	if data == nil {
+		return make([]*nodev1.Operation, 0)
+	}
 	items := make([]*nodev1.Operation, 0)
 	data.IterValues(func(v *planWithMetaData) (stop bool) {
router/core/graph_server.go (1)

1310-1311: Potential unsynchronized read of enabled field.

This directly accesses opts.SwitchoverConfig.inMemorySwitchOverCache.enabled without holding the lock. While this likely occurs during initialization when no concurrent updates are expected, it's inconsistent with the mutex-protected access pattern used elsewhere.

Consider exposing a synchronized getter method on SwitchoverConfig or InMemorySwitchOverCache to ensure thread-safe access.

Suggested approach

Add a synchronized getter to InMemorySwitchOverCache:

func (c *InMemorySwitchOverCache) isEnabled() bool {
    c.mu.RLock()
    defer c.mu.RUnlock()
    return c.enabled
}

Then update the call:

-operationPlanner := NewOperationPlanner(executor, gm.planCache, opts.SwitchoverConfig.inMemorySwitchOverCache.enabled)
+operationPlanner := NewOperationPlanner(executor, gm.planCache, opts.SwitchoverConfig.inMemorySwitchOverCache.isEnabled())

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@router/core/restart_switchover_config.go`:
- Around line 54-72: The write to InMemorySwitchOverCache.enabled is racy;
change enabled to an atomic flag (e.g., sync/atomic or atomic.Bool) and replace
all reads/writes to use atomic Load/Store semantics: update the assignment in
updateStateFromConfig to use c.enabled.Store(newVal) and update the reader
methods getPlanCacheForFF, setPlanCacheForFF, processOnConfigChangeRestart, and
cleanupUnusedFeatureFlags to use c.enabled.Load() instead of direct field access
(or alternatively move both the write and any pre-check reads inside the
existing mu lock); this ensures no TOCTOU race when checking or updating
enabled.
♻️ Duplicate comments (1)
router/core/restart_switchover_config.go (1)

74-95: Data race: enabled read before lock acquisition.

The check at line 75 reads c.enabled before the RLock is acquired at line 79. If updateStateFromConfig runs concurrently and modifies enabled, this method may see an inconsistent state. The same pattern exists in setPlanCacheForFF (line 98), processOnConfigChangeRestart (line 108), and cleanupUnusedFeatureFlags (line 125).

Proposed fix: move enabled check inside the lock
 func (c *InMemorySwitchOverCache) getPlanCacheForFF(featureFlagKey string) []*nodev1.Operation {
-	if !c.enabled {
-		return nil
-	}
-
 	c.mu.RLock()
 	defer c.mu.RUnlock()
+	if !c.enabled {
+		return nil
+	}
 
 	switch cache := c.queriesForFeatureFlag[featureFlagKey].(type) {

Apply the same pattern to setPlanCacheForFF, processOnConfigChangeRestart, and cleanupUnusedFeatureFlags.

🧹 Nitpick comments (1)
router/core/restart_switchover_config_test.go (1)

373-457: Consider adding a concurrency test.

The tests cover functional behavior well, but given the data race concerns in the implementation, consider adding a test that exercises concurrent access to validate thread safety once the locking issues are addressed.

Example concurrency test
func TestInMemorySwitchOverCache_ConcurrentAccess(t *testing.T) {
    t.Parallel()
    cache := &InMemorySwitchOverCache{
        enabled:               true,
        queriesForFeatureFlag: make(map[string]any),
        logger:                zap.NewNop(),
    }

    var wg sync.WaitGroup
    
    // Concurrent reads
    for i := 0; i < 10; i++ {
        wg.Go(func() {
            _ = cache.getPlanCacheForFF("test-ff")
        })
    }
    
    // Concurrent writes
    for i := 0; i < 10; i++ {
        wg.Go(func() {
            cache.updateStateFromConfig(&Config{
                cacheWarmup: &config.CacheWarmupConfiguration{
                    Enabled:                    true,
                    InMemorySwitchoverFallback: true,
                },
            }, false)
        })
    }
    
    wg.Wait()
}

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
router/core/graph_server.go (1)

295-461: Propagate CosmoCacheWarmerEnabled to feature-flag muxes.

Line 456 builds feature-flag muxes without setting CosmoCacheWarmerEnabled, so the warmup switch at Line 1367 can incorrectly choose in-memory fallback even when CDN warmup is enabled. This changes behavior for feature-flag caches compared to the base graph.

✅ Suggested fix
-	multiGraphHandler, err := s.buildMultiGraphHandler(ctx, gm.mux, featureFlagConfigMap, switchoverConfig)
+	multiGraphHandler, err := s.buildMultiGraphHandler(ctx, gm.mux, featureFlagConfigMap, switchoverConfig, cosmoCacheWarmerEnabled)
-func (s *graphServer) buildMultiGraphHandler(
+func (s *graphServer) buildMultiGraphHandler(
 	ctx context.Context,
 	baseMux *chi.Mux,
 	featureFlagConfigs map[string]*nodev1.FeatureFlagRouterExecutionConfig,
-	switchoverConfig *SwitchoverConfig,
+	switchoverConfig *SwitchoverConfig,
+	cosmoCacheWarmerEnabled bool,
 ) (http.HandlerFunc, error) {
 		gm, err := s.buildGraphMux(ctx, BuildGraphMuxOptions{
 			FeatureFlagName:     featureFlagName,
 			RouterConfigVersion: executionConfig.GetVersion(),
 			EngineConfig:        executionConfig.GetEngineConfig(),
 			ConfigSubgraphs:     executionConfig.Subgraphs,
 			SwitchoverConfig:    switchoverConfig,
+			CosmoCacheWarmerEnabled: cosmoCacheWarmerEnabled,
 		})
♻️ Duplicate comments (1)
router/core/restart_switchover_config.go (1)

110-124: Preserve already-converted entries on repeated restart processing.

Line 119 only copies entries that are still planCache. If processOnConfigChangeRestart is called again after a prior conversion, existing []*nodev1.Operation entries are silently dropped, resulting in an empty warmup set.

✅ Suggested fix
-	for k, v := range c.queriesForFeatureFlag {
-		if cache, ok := v.(planCache); ok {
-			switchoverMap[k] = convertToNodeOperation(cache)
-		}
-	}
+	for k, v := range c.queriesForFeatureFlag {
+		switch cache := v.(type) {
+		case planCache:
+			switchoverMap[k] = convertToNodeOperation(cache)
+		case []*nodev1.Operation:
+			// Already converted
+			switchoverMap[k] = cache
+		default:
+			c.logger.Error("unexpected type")
+		}
+	}

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@router/core/graph_server.go`:
- Around line 1313-1314: The check for switchover enabled reads shared state
without synchronization causing a data race; add a helper method on
InMemorySwitchOverCache (e.g., isEnabled()) that takes an RLock, nil-guards
queriesForFeatureFlag and returns whether it is non-nil, then replace the direct
access in graph_server.go (the line setting storeQueryContent using
opts.SwitchoverConfig.inMemorySwitchOverCache) with a call to that helper (also
nil-guard SwitchoverConfig itself before calling), so updateStateFromConfig’s
mutations and this read are synchronized via the cache’s mutex.

In `@router/core/restart_switchover_config.go`:
- Around line 148-154: convertToNodeOperation may miss buffered writes because
IterValues is non-atomic; call the cache's Wait() to flush pending asynchronous
sets before iterating. In convertToNodeOperation (which receives data planCache)
invoke the appropriate Wait() method on the planCache (or its underlying
Ristretto cache field) immediately before data.IterValues(...) so pending writes
are flushed and iteration sees a consistent snapshot. Ensure you use the same
Wait() approach as the tests (or implement an explicit snapshot method) to avoid
missing entries.
♻️ Duplicate comments (1)
router/core/restart_switchover_config.go (1)

115-121: Preserve already-converted switchover entries on repeated restarts.

On a second config-change restart, queriesForFeatureFlag will already contain []*nodev1.Operation. The current loop only keeps planCache values, silently dropping converted entries and losing warmup candidates. This defeats the in-memory warmup on successive restarts.

🔧 Proposed fix (keep already-converted entries)
for k, v := range c.queriesForFeatureFlag {
-	if cache, ok := v.(planCache); ok {
-		switchoverMap[k] = convertToNodeOperation(cache)
-	}
+	switch cache := v.(type) {
+	case planCache:
+		switchoverMap[k] = convertToNodeOperation(cache)
+	case []*nodev1.Operation:
+		switchoverMap[k] = cache
+	default:
+		c.logger.Error("unexpected type")
+	}
}

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@router/core/restart_switchover_config.go`:
- Around line 113-128: The extractQueriesAndOverridePlanCache method currently
only preserves entries of type planCache and drops already-converted operation
slices; update the loop in
InMemorySwitchOverCache.extractQueriesAndOverridePlanCache to handle both types
by checking the value's type (e.g., planCache and []*nodev1.Operation or []
*nodev1.Operation) and copying existing operation slices into switchoverMap
unchanged while converting planCache entries via convertToNodeOperation(cache);
ensure you preserve other non-matching entries as-is or ignore explicitly
depending on intended semantics so repeated calls do not lose
previously-converted data.
♻️ Duplicate comments (1)
router/core/restart_switchover_config.go (1)

18-24: Add documentation for exported types and functions.

The exported SwitchoverConfig type has a brief comment, but the exported functions (NewSwitchoverConfig, UpdateSwitchoverConfig, CleanupFeatureFlags, ProcessOnConfigChangeRouterInstanceRestart) lack godoc comments describing their purpose and usage.

🧹 Nitpick comments (2)
router/core/supervisor.go (1)

160-163: Consider adding a nil check for defensive coding.

If switchoverConfig is ever nil (e.g., in tests or alternative router factory implementations), this will panic. While the current wiring always provides a SwitchoverConfig, a nil guard would make this more robust.

🛡️ Suggested defensive check
 if !shutdown {
+	if rs.router.switchoverConfig != nil {
 		rs.router.switchoverConfig.ProcessOnConfigChangeRouterInstanceRestart()
+	}
 }
router/core/restart_switchover_config_test.go (1)

340-413: Add test case for calling extractQueriesAndOverridePlanCache twice.

The current tests only verify the first conversion. A test that calls the method twice would verify that already-converted []*nodev1.Operation entries are preserved (which currently they're not—see related issue in production code).

🧪 Suggested additional test case
t.Run("preserves already-converted operation slices on second call", func(t *testing.T) {
	t.Parallel()
	mockCache, err := ristretto.NewCache(&ristretto.Config[uint64, *planWithMetaData]{
		MaxCost:            10000,
		NumCounters:        10000000,
		IgnoreInternalCost: true,
		BufferItems:        64,
	})
	require.NoError(t, err)

	query1 := "query { test1 }"
	mockCache.Set(1, &planWithMetaData{content: query1}, 1)
	mockCache.Wait()

	// Pre-existing converted operations
	existingOps := []*nodev1.Operation{
		{Request: &nodev1.OperationRequest{Query: "query { existing }"}},
	}

	cache := &InMemorySwitchOverCache{
		queriesForFeatureFlag: make(map[string]any),
	}
	cache.queriesForFeatureFlag["ff1"] = mockCache
	cache.queriesForFeatureFlag["ff2"] = existingOps

	cache.extractQueriesAndOverridePlanCache()

	// Both should be preserved
	require.Len(t, cache.queriesForFeatureFlag, 2)
	require.IsType(t, []*nodev1.Operation{}, cache.queriesForFeatureFlag["ff1"])
	require.IsType(t, []*nodev1.Operation{}, cache.queriesForFeatureFlag["ff2"])

	ff2Ops := cache.queriesForFeatureFlag["ff2"].([]*nodev1.Operation)
	require.Equal(t, "query { existing }", ff2Ops[0].Request.Query)
})

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
router/core/graph_server.go (1)

1373-1394: Potential nil pointer dereference on SwitchoverConfig in in-memory switchover fallback case.

Lines 1386-1387 access opts.SwitchoverConfig.inMemorySwitchOverCache without verifying that opts.SwitchoverConfig is non-nil. If the configuration enables InMemorySwitchoverFallback but SwitchoverConfig was not initialized (e.g., due to a configuration mismatch), this will cause a panic.

🐛 Proposed fix: Add nil check to the switch case condition
-	case s.cacheWarmup.InMemorySwitchoverFallback && (s.selfRegister == nil || !opts.CosmoCacheWarmerEnabled):
+	case s.cacheWarmup.InMemorySwitchoverFallback && opts.SwitchoverConfig != nil && (s.selfRegister == nil || !opts.CosmoCacheWarmerEnabled):
🤖 Fix all issues with AI agents
In `@router/core/graph_server.go`:
- Around line 1322-1323: The call to
opts.SwitchoverConfig.inMemorySwitchOverCache.IsEnabled() can panic if
SwitchoverConfig is nil; change the construction of operationPlanner
(NewOperationPlanner(executor, gm.planCache, ...)) to compute the boolean safely
by first checking if opts.SwitchoverConfig != nil and only then calling
inMemorySwitchOverCache.IsEnabled(), otherwise default to false, and pass that
computed value into NewOperationPlanner to avoid a nil pointer dereference.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants