-
Notifications
You must be signed in to change notification settings - Fork 207
feat: rewarm cache from planner cache on schema change #2445
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat: rewarm cache from planner cache on schema change #2445
Conversation
WalkthroughAdds an in-memory switchover cache and wiring to persist per-feature plan data across config-change restarts; extends planner cache interfaces and plan sources; threads SwitchoverConfig and cache-warmer flags through supervisor/startup/graph-server; updates config/schema/proto; and adds unit and integration tests. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Comment |
Router image scan passed✅ No security vulnerabilities found in image: |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #2445 +/- ##
==========================================
+ Coverage 1.49% 36.06% +34.57%
==========================================
Files 292 946 +654
Lines 46960 123527 +76567
Branches 431 4944 +4513
==========================================
+ Hits 703 44553 +43850
- Misses 45974 77409 +31435
- Partials 283 1565 +1282
🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
router/go.mod (2)
175-199: Fix inconsistent ristretto/v2 fork across multi-module repoThe router module pins a fork (
github.com/wundergraph/ristretto/v2 v2.0.0-20260113151003-f7231509ad4d), but router-tests, demo, and graphqlmetrics modules all require the upstream version (v2.1.0). This creates mixed implementations at build time. The replace directive in router/go.mod only affects that module's dependencies, not the workspace. Either apply the same replace directive to other go.mod files that depend on ristretto, or unify the version across all modules.
59-88: Add explanatory comment about the Ristretto fork in go.modThe fork
github.com/wundergraph/ristretto/v2at commit f7231509ad4d (branchmilinda/iterating_elements) adds iterator support for cache elements, which is not available in the upstream v2.0.0 release. Add a comment in the replace directive explaining this purpose. Since no other modules in the repo depend on ristretto v2.1.x features (the require block specifies only v2.0.0), there are no compatibility concerns with using the fork.
🤖 Fix all issues with AI agents
In `@router-tests/cache_warmup_test.go`:
- Around line 1100-1106: The test uses require.EventuallyWithT with an
assert.CollectT callback but calls require.Equal inside the retry closure (in
the block passed to require.EventuallyWithT) which will abort immediately;
replace the failing assertions inside that callback with assert.Equal so the
checks are retried: locate the closure passed to require.EventuallyWithT (where
res = xEnv.MakeGraphQLRequestOK(...)) and change require.Equal(t, "updated",
res.Response.Header.Get("X-Router-Config-Version")) and require.Equal(t, "HIT",
res.Response.Header.Get("x-wg-execution-plan-cache")) to assert.Equal(t,
"updated", ...) and assert.Equal(t, "HIT", ...).
In `@router/core/http_server.go`:
- Line 39: Remove the dead field from the httpServerOptions struct:
`switchoverConfig` is never set when creating `httpServerOptions` and is not
used by `newServer()`, since `SwitchoverConfig` is handled at the `Router` level
and passed into `newGraphServer`; delete the `switchoverConfig
*SwitchoverConfig` field from the `httpServerOptions` definition and update any
references (none expected) to rely on the Router/newGraphServer flow instead.
In `@router/core/restart_switchover_config.go`:
- Around line 24-38: UpdateSwitchoverConfig writes
s.inMemorySwitchOverCache.enabled without synchronization which races with reads
in getPlanCacheForFF, setPlanCacheForFF, and cleanupUnusedFeatureFlags; fix by
acquiring the SwitchoverConfig write lock (e.g., s.mu or the existing cache
lock) around the block that sets s.inMemorySwitchOverCache.enabled and mutates
queriesForFeatureFlag/testValue, and change the reader methods
getPlanCacheForFF, setPlanCacheForFF, and cleanupUnusedFeatureFlags to check
enabled while holding the same lock (or alternatively convert enabled to an
atomic.Bool and use atomic load/store for reads/writes) so readers and writers
are consistent.
In `@router/core/router.go`:
- Around line 1177-1181: Fix the typo in the inline comment near the switchover
config initialization: change "sueprvisor" to "supervisor" in the comment that
precedes the r.switchoverConfig = NewSwitchoverConfig() block so the comment
reads correctly while leaving the surrounding logic (r.switchoverConfig,
NewSwitchoverConfig(), UpdateSwitchoverConfig(&r.Config)) unchanged.
🧹 Nitpick comments (4)
router/pkg/config/config.go (1)
974-986: Consider makingCacheWarmupSource.InMemorySwitchovera pointer to preserve “unset vs disabled”
At Line 974-977,InMemorySwitchoveris a non-pointer struct withomitempty. Depending on the marshaler, this may still serialize as present (and it definitely can’t represent “not provided” distinctly from “provided but disabled”), which tends to leak into golden testdata and example configs (as seen inconfig_defaults.json).If you want
cache_warmup.sourceto be truly optional/empty unless explicitly configured, consider changing it to a pointer like*CacheWarmupInMemorySwitchover.Proposed change
type CacheWarmupSource struct { Filesystem *CacheWarmupFileSystemSource `yaml:"filesystem,omitempty"` - InMemorySwitchover CacheWarmupInMemorySwitchover `yaml:"in_memory_switchover,omitempty"` + InMemorySwitchover *CacheWarmupInMemorySwitchover `yaml:"in_memory_switchover,omitempty"` }router/pkg/config/config.schema.json (1)
2387-2423: Schema polish: adddefault: false+ tighten wording forin_memory_switchover.enabled
For the new block at Line 2404-2413:
enabledhas no default in the schema (even though the Go config usesenvDefault:"false"). Consider adding"default": falsefor consistency with other booleans.- Minor: description “Is in memory switchover enabled” could be normalized (and maybe mention it reuses prior process’ cached plans, if that’s the intent).
router/core/cache_warmup_plans.go (1)
16-30: Consider preserving additional operation metadata for complete cache warmup.The current implementation only extracts the
Querycontent from cached plans. For accurate cache warmup, operations may also need variables, operation name, and headers that were part of the original request. IfplanWithMetaDatastores additional context, consider extracting it here.Also, based on the past discussion about limiting cache warmer operations to ~100, this implementation iterates unboundedly. Consider adding a limit if the plan cache can grow large.
router/core/restart_switchover_config.go (1)
44-49: Consider documenting or removing thetestValuefield.The
testValuefield appears to be for testing purposes but is not documented. If it's only used in tests, consider adding a comment explaining its purpose, or moving test-specific logic elsewhere.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
router/go.sumis excluded by!**/*.sum
📒 Files selected for processing (16)
router-tests/cache_warmup_test.gorouter/cmd/main.gorouter/core/cache_warmup_plans.gorouter/core/graph_server.gorouter/core/http_server.gorouter/core/operation_planner.gorouter/core/restart_switchover_config.gorouter/core/router.gorouter/core/router_config.gorouter/core/supervisor.gorouter/core/supervisor_instance.gorouter/go.modrouter/pkg/config/config.gorouter/pkg/config/config.schema.jsonrouter/pkg/config/testdata/config_defaults.jsonrouter/pkg/config/testdata/config_full.json
🧰 Additional context used
🧠 Learnings (10)
📓 Common learnings
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2172
File: router/core/graph_server.go:0-0
Timestamp: 2025-09-17T20:55:39.456Z
Learning: The Initialize method in router/internal/retrytransport/manager.go has been updated to properly handle feature-flag-only subgraphs by collecting subgraphs from both routerConfig.GetSubgraphs() and routerConfig.FeatureFlagConfigs.ConfigByFeatureFlagName, ensuring all subgraphs receive retry configuration.
📚 Learning: 2025-09-17T20:55:39.456Z
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2172
File: router/core/graph_server.go:0-0
Timestamp: 2025-09-17T20:55:39.456Z
Learning: The Initialize method in router/internal/retrytransport/manager.go has been updated to properly handle feature-flag-only subgraphs by collecting subgraphs from both routerConfig.GetSubgraphs() and routerConfig.FeatureFlagConfigs.ConfigByFeatureFlagName, ensuring all subgraphs receive retry configuration.
Applied to files:
router/core/supervisor.gorouter/cmd/main.gorouter/core/router.gorouter-tests/cache_warmup_test.gorouter/core/restart_switchover_config.gorouter/core/supervisor_instance.gorouter/core/graph_server.go
📚 Learning: 2026-01-06T12:37:21.521Z
Learnt from: asoorm
Repo: wundergraph/cosmo PR: 2379
File: router/pkg/connectrpc/operation_registry_test.go:381-399
Timestamp: 2026-01-06T12:37:21.521Z
Learning: In Go code (Go 1.25+), prefer using sync.WaitGroup.Go(func()) to run a function in a new goroutine, letting the WaitGroup manage Add/Done automatically. Avoid manual wg.Add(1) followed by go func() { defer wg.Done(); ... }() patterns. Apply this guidance across all Go files in the wundergraph/cosmo repository where concurrency is used.
Applied to files:
router/core/supervisor.gorouter/core/http_server.gorouter/core/router_config.gorouter/core/cache_warmup_plans.gorouter/pkg/config/config.gorouter/cmd/main.gorouter/core/router.gorouter-tests/cache_warmup_test.gorouter/core/restart_switchover_config.gorouter/core/supervisor_instance.gorouter/core/operation_planner.gorouter/core/graph_server.go
📚 Learning: 2025-08-12T13:50:45.964Z
Learnt from: Noroth
Repo: wundergraph/cosmo PR: 2132
File: router-plugin/plugin.go:139-146
Timestamp: 2025-08-12T13:50:45.964Z
Learning: In the Cosmo router plugin system, the plugin framework creates its own logger independently. When creating a logger in NewRouterPlugin, it's only needed for the gRPC server setup (passed via setup.GrpcServerInitOpts.Logger) and doesn't need to be assigned back to serveConfig.Logger.
Applied to files:
router/cmd/main.go
📚 Learning: 2025-10-01T20:39:16.113Z
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2252
File: router-tests/telemetry/telemetry_test.go:9684-9693
Timestamp: 2025-10-01T20:39:16.113Z
Learning: Repo preference: In router-tests/telemetry/telemetry_test.go, keep strict > 0 assertions for request.operation.*Time (parsingTime, normalizationTime, validationTime, planningTime) in telemetry-related tests; do not relax to >= 0 unless CI flakiness is observed.
Applied to files:
router-tests/cache_warmup_test.go
📚 Learning: 2025-08-20T22:13:25.222Z
Learnt from: StarpTech
Repo: wundergraph/cosmo PR: 2157
File: router-tests/go.mod:16-16
Timestamp: 2025-08-20T22:13:25.222Z
Learning: github.com/mark3labs/mcp-go v0.38.0 has regressions and should not be used in the wundergraph/cosmo project. v0.36.0 is the stable version that should be used across router-tests and other modules.
Applied to files:
router-tests/cache_warmup_test.gorouter/go.mod
📚 Learning: 2025-08-26T17:19:28.178Z
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2167
File: router/internal/expr/retry_context.go:3-10
Timestamp: 2025-08-26T17:19:28.178Z
Learning: In the Cosmo router codebase, prefer using netErr.Timeout() checks over explicit context.DeadlineExceeded checks for timeout detection in retry logic, as Go's networking stack typically wraps context deadline exceeded errors in proper net.Error implementations.
Applied to files:
router-tests/cache_warmup_test.go
📚 Learning: 2025-08-28T09:18:10.121Z
Learnt from: endigma
Repo: wundergraph/cosmo PR: 2141
File: router-tests/http_subscriptions_test.go:100-108
Timestamp: 2025-08-28T09:18:10.121Z
Learning: In router-tests/http_subscriptions_test.go heartbeat tests, the message ordering should remain strict with data messages followed by heartbeat messages, as the timing is deterministic and known by design in the Cosmo router implementation.
Applied to files:
router-tests/cache_warmup_test.go
📚 Learning: 2025-08-20T10:08:17.857Z
Learnt from: endigma
Repo: wundergraph/cosmo PR: 2155
File: router/core/router.go:1857-1866
Timestamp: 2025-08-20T10:08:17.857Z
Learning: router/pkg/config/config.schema.json forbids null values for traffic_shaping.subgraphs: additionalProperties references $defs.traffic_shaping_subgraph_request_rule with type "object". Therefore, in core.NewSubgraphTransportOptions, dereferencing each subgraph rule pointer is safe under schema-validated configs, and a nil-check is unnecessary.
Applied to files:
router/core/graph_server.go
📚 Learning: 2025-11-12T18:38:46.619Z
Learnt from: StarpTech
Repo: wundergraph/cosmo PR: 2327
File: graphqlmetrics/core/metrics_service.go:435-442
Timestamp: 2025-11-12T18:38:46.619Z
Learning: In graphqlmetrics/core/metrics_service.go, the opGuardCache TTL (30 days) is intentionally shorter than the gql_metrics_operations database retention period (90 days). This design reduces memory usage while relying on database constraints to prevent duplicate inserts after cache expiration.
Applied to files:
router/core/graph_server.go
🧬 Code graph analysis (6)
router/core/supervisor.go (2)
router/core/restart_switchover_config.go (1)
SwitchoverConfig(14-16)router/core/router.go (1)
Router(84-94)
router/core/http_server.go (1)
router/core/restart_switchover_config.go (1)
SwitchoverConfig(14-16)
router/cmd/main.go (1)
router/core/restart_switchover_config.go (2)
SwitchoverConfig(14-16)NewSwitchoverConfig(18-22)
router/core/router.go (1)
router/core/restart_switchover_config.go (2)
SwitchoverConfig(14-16)NewSwitchoverConfig(18-22)
router/core/supervisor_instance.go (3)
router/core/restart_switchover_config.go (1)
SwitchoverConfig(14-16)router/core/router_config.go (1)
Config(49-144)router/core/router.go (2)
Option(174-174)WithSwitchoverConfig(2057-2061)
router/core/graph_server.go (4)
router/core/restart_switchover_config.go (1)
SwitchoverConfig(14-16)router/core/router.go (1)
Router(84-94)router/core/operation_planner.go (1)
NewOperationPlanner(50-57)router/core/cache_warmup_plans.go (1)
NewPlanSource(16-30)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
- GitHub Check: build-router
- GitHub Check: integration_test (./telemetry)
- GitHub Check: build_push_image (nonroot)
- GitHub Check: build_push_image
- GitHub Check: integration_test (./events)
- GitHub Check: build_test
- GitHub Check: image_scan (nonroot)
- GitHub Check: image_scan
- GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)
- GitHub Check: build_test
- GitHub Check: Analyze (go)
- GitHub Check: Analyze (javascript-typescript)
🔇 Additional comments (36)
router/cmd/main.go (1)
152-172: SwitchoverConfig wiring looks correct; ensure lifecycle matches intended cache persistence scope
Line 152-155 passes a singlecore.NewSwitchoverConfig()into the supervisor—so caches/state will live for the process lifetime across reloads (likely desired for “reprocess existing plans”). Just ensure this is the intended scope (vs “per reload”) and that the underlying cache is concurrency-safe.router/pkg/config/testdata/config_defaults.json (1)
311-322: config_defaults.json is a golden test output file, not a configuration subject to schema validation—no changes neededThe concern about schema mismatch is based on a misunderstanding of the file's purpose.
config_defaults.jsonis purely a golden test artifact inTestDefaults()used to verify the JSON serialization of the Config struct; it is never loaded from disk or validated againstconfig.schema.json.The schema validation in
LoadConfig()applies only to YAML input files. When the Config struct is marshaled to JSON for the golden test, Go's default JSON marshaling uses PascalCase field names (Filesystem,InMemorySwitchover) and includes null values where structs haveomitemptyYAML tags. This serialization format is independent of the schema, which constrains YAML input with snake_case keys. The presence of"Filesystem": nullin the output is expected and correct.router/pkg/config/testdata/config_full.json (1)
660-671: LGTM!The
InMemorySwitchoverconfiguration block is correctly added underCacheWarmup.Source, following the existing pattern withFilesystem. The defaultEnabled: falseis appropriate for this comprehensive config test file.router/core/supervisor.go (3)
32-36: LGTM!The
SwitchoverConfigfield is correctly added toRouterResources, enabling the switchover configuration to be passed through to downstream components like the Router.
38-44: LGTM!The
SwitchoverConfigfield inRouterSupervisorOptsprovides the input path for the switchover configuration.
52-55: LGTM!The initialization correctly propagates
SwitchoverConfigfrom options to resources, completing the wiring through the supervisor layer.router-tests/cache_warmup_test.go (5)
5-6: LGTM!The new imports are appropriate for the added test functionality:
assertforEventuallyWithTcallbacks andconfigpollerfor the mock configuration poller.
919-971: LGTM - well-structured test for in-memory switchover caching.The test correctly validates that:
- Initial request results in a cache MISS
- After config update (simulating switchover), the same query results in a cache HIT
This confirms the in-memory plan cache is preserved across config reloads.
973-1014: LGTM!Correctly tests that cache is not preserved when cache warmup is entirely disabled, resulting in MISS on both requests.
1016-1062: LGTM!Correctly tests that the default cache warmer (with
InMemorySwitchover.Enabled = false) does not preserve plans across config reloads.
1064-1108: BothwriteTestConfig(defined in router-tests/config_hot_reload_test.go:579) andConfigPollerMock(defined in router-tests/config_hot_reload_test.go:32) are properly defined in the test package and are accessible to this test file.router/core/router_config.go (1)
314-321: LGTM - source classification logic is correct and type-safe.The addition correctly classifies the cache warmup source. The control flow properly prioritizes
FilesystemoverInMemorySwitchover. SinceInMemorySwitchoveris a value type (not a pointer), accessing itsEnabledfield is safe—no nil check is needed.router/core/cache_warmup_plans.go (1)
32-34: LGTM!Simple and efficient - queries are extracted once during construction, allowing the source cache to be garbage collected.
router/core/supervisor_instance.go (3)
53-53: LGTM!Correctly threads
SwitchoverConfigthrough the router option construction.
184-184: LGTM!Function signature updated to accept
switchoverConfigparameter for propagation.
275-275: LGTM!
WithSwitchoverConfigoption correctly added to the options list.router/core/router.go (4)
93-93: LGTM!Field addition to hold switchover configuration state.
600-600: LGTM!Correctly passes
switchoverConfigtonewGraphServerfor plan cache management.
608-610: LGTM!Feature flag cleanup after server swap ensures unused caches are pruned.
2057-2061: LGTM!Standard option pattern for setting switchover configuration.
router/core/restart_switchover_config.go (2)
71-78: Clarify behavior whenFeatureFlagConfigsis nil.When
routerCfg.FeatureFlagConfigsis nil, the method returns early without cleaning up any cached feature flags. This means stale feature flag caches could persist. Is this intentional?If feature flags can be completely removed from a config, consider clearing all non-base caches when
FeatureFlagConfigsis nil.
83-96: LGTM!The cleanup logic correctly preserves the base configuration (empty string key) while removing caches for feature flags that no longer exist in the router config.
router/core/operation_planner.go (8)
24-24: LGTM!New
contentfield enables storing the original query for cache warmup scenarios.
32-37: LGTM!The
storeContentflag andoperationPlannerOptsstruct provide clean control over content storage behavior.
50-56: LGTM!Constructor correctly accepts and stores the
storeContentflag.
59-59: LGTM!Signature update to accept
operationPlannerOptsfor controlling plan preparation behavior.
93-95: LGTM!Content is conditionally stored based on the
storeContentoption, avoiding unnecessary memory usage when content storage is not needed.
122-122: LGTM!When skipping cache (for tracing or query plan inclusion), content storage is disabled since these plans are not cached.
150-150: LGTM!Normal cache path correctly uses the planner's
storeContentsetting.
44-46: TheItermethod is already implemented byristretto.Cache, which is the onlyExecutionPlanCacheimplementation used in the codebase. It's already being called inrouter/core/cache_warmup_plans.go:21without issues. AddingIterto the interface is not a breaking change.Likely an incorrect or invalid review comment.
router/core/graph_server.go (6)
108-116: LGTM!The addition of
SwitchoverConfigtoBuildGraphMuxOptionsfollows the existing pattern for optional configuration parameters.
122-123: LGTM!The signature extension to accept
switchoverConfigis correctly propagated through the initialization path.
434-458: LGTM!The
switchoverConfigis correctly threaded through to feature flag mux construction, ensuring consistent behavior across base and feature flag graphs.
526-532: LGTM!Field alignment changes are cosmetic and improve readability.
1303-1304: No nil pointer dereference risk. SwitchoverConfig and inMemorySwitchOverCache are guaranteed non-nil by initialization.The
opts.SwitchoverConfigparameter passed tonewGraphServeris guaranteed to be non-nil with a validinMemorySwitchOverCachebecause:
- Before
newServer()callsnewGraphServer()at line 600, the Router'sStart()method initializesr.switchoverConfigwith a nil check at lines 1177-1179.NewSwitchoverConfig()always returns a SwitchoverConfig with an initializedinMemorySwitchOverCachefield (never nil).Accessing
opts.SwitchoverConfig.inMemorySwitchOverCache.enabledis safe; no defensive nil checks are needed.Likely an incorrect or invalid review comment.
1358-1362: Thread safety is already properly implemented; remove this concern.The
getPlanCacheForFFandsetPlanCacheForFFmethods properly usesync.RWMutex(RLock/RUnlock for reads, Lock/Unlock for writes with defers) to synchronize access to the shared cache map. The nil safety concern aboutopts.SwitchoverConfigis not an issue—line 1304 in the same function already accessesopts.SwitchoverConfig.inMemorySwitchOverCache.enabledwithout guards, meaning the parameter is guaranteed to be non-nil by design.Likely an incorrect or invalid review comment.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@router-tests/go.mod`:
- Line 207: The go.mod contains a replace directive pointing to the non-public
fork "github.com/wundergraph/ristretto/v2" which breaks module resolution;
remove that replace and switch to the upstream module
"github.com/dgraph-io/ristretto/v2" (or make the fork public if you must keep
it). Concretely, delete the replace entry that maps
"github.com/wundergraph/ristretto/v2 => github.com/wundergraph/ristretto/v2
v2.0.0-..." and ensure the project requires "github.com/dgraph-io/ristretto/v2"
(update to v2.1.0 if needed), then run "go mod tidy" to refresh the dependency
graph and verify builds/CI pass.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
router-tests/go.sumis excluded by!**/*.sum
📒 Files selected for processing (5)
router-tests/cache_warmup_test.gorouter-tests/go.modrouter/core/restart_switchover_config.gorouter/core/restart_switchover_config_test.gorouter/core/router.go
🧰 Additional context used
🧠 Learnings (10)
📚 Learning: 2025-08-20T22:13:25.222Z
Learnt from: StarpTech
Repo: wundergraph/cosmo PR: 2157
File: router-tests/go.mod:16-16
Timestamp: 2025-08-20T22:13:25.222Z
Learning: github.com/mark3labs/mcp-go v0.38.0 has regressions and should not be used in the wundergraph/cosmo project. v0.36.0 is the stable version that should be used across router-tests and other modules.
Applied to files:
router-tests/go.modrouter-tests/cache_warmup_test.go
📚 Learning: 2025-09-24T12:54:00.765Z
Learnt from: endigma
Repo: wundergraph/cosmo PR: 2222
File: router-tests/websocket_test.go:2238-2302
Timestamp: 2025-09-24T12:54:00.765Z
Learning: The wundergraph/cosmo project uses Go 1.25 (Go 1.23+ minimum), so fmt.Appendf and other newer Go standard library functions are available and can be used without compatibility concerns.
Applied to files:
router-tests/go.mod
📚 Learning: 2025-09-24T12:54:00.765Z
Learnt from: endigma
Repo: wundergraph/cosmo PR: 2222
File: router-tests/websocket_test.go:2238-2302
Timestamp: 2025-09-24T12:54:00.765Z
Learning: The wundergraph/cosmo project uses Go 1.25 (Go 1.25 minimum), so fmt.Appendf and other newer Go standard library functions are available and can be used without compatibility concerns.
Applied to files:
router-tests/go.mod
📚 Learning: 2025-09-17T20:55:39.456Z
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2172
File: router/core/graph_server.go:0-0
Timestamp: 2025-09-17T20:55:39.456Z
Learning: The Initialize method in router/internal/retrytransport/manager.go has been updated to properly handle feature-flag-only subgraphs by collecting subgraphs from both routerConfig.GetSubgraphs() and routerConfig.FeatureFlagConfigs.ConfigByFeatureFlagName, ensuring all subgraphs receive retry configuration.
Applied to files:
router/core/restart_switchover_config_test.gorouter/core/restart_switchover_config.gorouter-tests/cache_warmup_test.gorouter/core/router.go
📚 Learning: 2025-10-01T20:39:16.113Z
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2252
File: router-tests/telemetry/telemetry_test.go:9684-9693
Timestamp: 2025-10-01T20:39:16.113Z
Learning: Repo preference: In router-tests/telemetry/telemetry_test.go, keep strict > 0 assertions for request.operation.*Time (parsingTime, normalizationTime, validationTime, planningTime) in telemetry-related tests; do not relax to >= 0 unless CI flakiness is observed.
Applied to files:
router/core/restart_switchover_config_test.gorouter-tests/cache_warmup_test.go
📚 Learning: 2026-01-06T12:37:21.521Z
Learnt from: asoorm
Repo: wundergraph/cosmo PR: 2379
File: router/pkg/connectrpc/operation_registry_test.go:381-399
Timestamp: 2026-01-06T12:37:21.521Z
Learning: In Go code (Go 1.25+), prefer using sync.WaitGroup.Go(func()) to run a function in a new goroutine, letting the WaitGroup manage Add/Done automatically. Avoid manual wg.Add(1) followed by go func() { defer wg.Done(); ... }() patterns. Apply this guidance across all Go files in the wundergraph/cosmo repository where concurrency is used.
Applied to files:
router/core/restart_switchover_config_test.gorouter/core/restart_switchover_config.gorouter-tests/cache_warmup_test.gorouter/core/router.go
📚 Learning: 2025-07-29T08:19:55.720Z
Learnt from: Noroth
Repo: wundergraph/cosmo PR: 2088
File: demo/pkg/subgraphs/projects/src/main_test.go:0-0
Timestamp: 2025-07-29T08:19:55.720Z
Learning: In Go testing, t.Fatal, t.FailNow, t.Skip* and similar methods cannot be called from goroutines other than the main test goroutine, as they will cause a runtime panic. Use channels, t.Error*, or other synchronization mechanisms to communicate errors from goroutines back to the main test thread.
Applied to files:
router-tests/cache_warmup_test.go
📚 Learning: 2025-08-28T09:18:10.121Z
Learnt from: endigma
Repo: wundergraph/cosmo PR: 2141
File: router-tests/http_subscriptions_test.go:100-108
Timestamp: 2025-08-28T09:18:10.121Z
Learning: In router-tests/http_subscriptions_test.go heartbeat tests, the message ordering should remain strict with data messages followed by heartbeat messages, as the timing is deterministic and known by design in the Cosmo router implementation.
Applied to files:
router-tests/cache_warmup_test.go
📚 Learning: 2025-08-26T17:19:28.178Z
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2167
File: router/internal/expr/retry_context.go:3-10
Timestamp: 2025-08-26T17:19:28.178Z
Learning: In the Cosmo router codebase, prefer using netErr.Timeout() checks over explicit context.DeadlineExceeded checks for timeout detection in retry logic, as Go's networking stack typically wraps context deadline exceeded errors in proper net.Error implementations.
Applied to files:
router-tests/cache_warmup_test.go
📚 Learning: 2025-09-29T10:28:07.122Z
Learnt from: dkorittki
Repo: wundergraph/cosmo PR: 2192
File: router/pkg/config/config.go:1028-1029
Timestamp: 2025-09-29T10:28:07.122Z
Learning: The deprecation strategy for IntrospectionEnabled field in router/pkg/config/config.go is to first mark it as deprecated, then migrate all call sites to use IntrospectionConfig.Enabled, and finally remove the deprecated field. The envDefault tag should be kept during the deprecation period for backward compatibility.
Applied to files:
router/core/router.go
🧬 Code graph analysis (3)
router/core/restart_switchover_config_test.go (2)
router/core/restart_switchover_config.go (2)
InMemorySwitchOverCache(32-36)NewSwitchoverConfig(18-22)router/pkg/config/config.go (2)
CacheWarmupConfiguration(989-995)CacheWarmupInMemorySwitchover(983-985)
router-tests/cache_warmup_test.go (5)
router-tests/testenv/testenv.go (4)
Run(105-122)Config(292-349)RouterConfig(246-249)Environment(1735-1771)router/pkg/config/config.go (5)
Config(1037-1112)CacheWarmupConfiguration(989-995)CacheWarmupSource(974-977)CacheWarmupInMemorySwitchover(983-985)ExecutionConfig(865-869)router/core/router.go (3)
Option(174-174)WithCacheWarmupConfig(2114-2118)ExecutionConfig(159-163)router/core/cache_warmup.go (1)
CacheWarmupSource(26-28)router/pkg/controlplane/configpoller/config_poller.go (1)
ConfigPoller(20-29)
router/core/router.go (1)
router/core/restart_switchover_config.go (2)
SwitchoverConfig(14-16)NewSwitchoverConfig(18-22)
🔇 Additional comments (9)
router/core/restart_switchover_config_test.go (1)
1-337: Test coverage looks comprehensive.The test file provides thorough coverage of the
SwitchoverConfigandInMemorySwitchOverCachefunctionality, including:
- Enable/disable state transitions
- Edge cases (nil config, already enabled/disabled)
- Feature flag cache operations (get/set)
- Cleanup of unused feature flags
- Delegation from
SwitchoverConfigtoInMemorySwitchOverCacherouter/core/restart_switchover_config.go (1)
11-22: LGTM for structure and initialization.The type definitions and constructor are well-structured. The
planCachetype alias improves readability, andNewSwitchoverConfigcorrectly initializes with a disabled cache state.router/core/router.go (4)
93-93: LGTM for field addition.The
switchoverConfigfield is properly added to the Router struct to maintain state across config changes.
599-611: Clean integration of switchover config in server creation.The switchover config is correctly passed to
newGraphServerandCleanupFeatureFlagsis appropriately called after the server swap to prune stale feature flag caches.
1177-1181: LGTM for initialization logic.The lazy initialization of
switchoverConfigfor test scenarios is appropriate, and the comment explaining this is helpful (and the typo from the previous review has been fixed).
2057-2061: LGTM for new option function.The
WithSwitchoverConfigoption follows the established pattern for router configuration options.router-tests/cache_warmup_test.go (3)
1100-1106: Correct usage ofassertinsideEventuallyWithTcallback.Using
assert.Equalinstead ofrequire.Equalinside theEventuallyWithTcallback is correct—this allows the retry mechanism to work properly by collecting failures rather than immediately aborting.
1064-1108: No action needed. ThewriteTestConfighelper function is defined inrouter-tests/config_hot_reload_test.goat line 579 and is accessible from the cache_warmup_test.go file since both are in the same package. The function is properly defined and available for use.
919-971: Code is correct as-is—no undefined types.
ConfigPollerMockandwriteTestConfigare defined inconfig_hot_reload_test.goin the same package (integration), making them accessible throughoutcache_warmup_test.go. The test implementation is valid.Likely an incorrect or invalid review comment.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
router/pkg/config/config.schema.json (1)
2387-2422: Allow emptycache_warmup.sourceor update the description.The description says an empty source falls back to CDN, but the
oneOfcurrently rejects{}. This makes configs withsource: {}invalid despite the documented behavior.🔧 Suggested schema tweak
"oneOf": [ { "required": ["filesystem"] }, + { + "required": ["in_memory_switchover"] + }, + { + "maxProperties": 0 + } - { - "required": ["in_memory_switchover"] - } ]
🤖 Fix all issues with AI agents
In `@router/core/restart_switchover_config.go`:
- Around line 113-126: In InMemorySwitchOverCache.processOnConfigChangeRestart,
avoid the pre-lock enabled check (data race) and the unchecked assertion
data.(planCache) which panics on repeated calls: acquire c.mu.Lock() first,
check c.enabled while holding the lock, then iterate c.queriesForFeatureFlag and
use a type switch (or type check) to handle both planCache and
[]*nodev1.Operation values—call convertToNodeOperation only when the value is
planCache, leave already-converted []*nodev1.Operation as-is (or copy them) and
assign into the new switchoverMap; this prevents panics on repeated invocations
and removes the race on enabled.
In `@router/core/router_config.go`:
- Around line 315-320: The code currently sets usage["cache_warmup_source"] to
"cdn" when InMemorySwitchoverFallback is false even if an explicit in-memory
source block exists; update the branching in the cache warmup usage logic to
inspect the configured source block directly (e.g., check
c.cacheWarmup.Source.InMemorySwitchover != nil or the appropriate
Source.InMemorySwitchover field) before falling back to CDN, ensuring you detect
and set "in_memory_switchover" when the in-memory source is present, still
preferring "filesystem" when c.cacheWarmup.Source.Filesystem != nil and using
InMemorySwitchoverFallback only as the fallback indicator.
♻️ Duplicate comments (3)
router/core/restart_switchover_config.go (3)
70-73: Data race onenabledflag — check should be moved inside the lock.The
enabledcheck at line 71 occurs before acquiring the read lock at line 75. Sinceenabledis written under the lock inupdateStateFromConfig, this creates a TOCTOU race if the config is updated concurrently.Proposed fix
func (c *InMemorySwitchOverCache) getPlanCacheForFF(featureFlagKey string) []*nodev1.Operation { - if !c.enabled { - return nil - } - c.mu.RLock() defer c.mu.RUnlock() + if !c.enabled { + return nil + } switch cache := c.queriesForFeatureFlag[featureFlagKey].(type) {
93-111: Same data race pattern insetPlanCacheForFFanddeletePlanCacheForFF.Both methods check
enabledbefore acquiring the lock. Apply the same fix as forgetPlanCacheForFF.Proposed fix
func (c *InMemorySwitchOverCache) setPlanCacheForFF(featureFlagKey string, cache planCache) { - if !c.enabled || cache == nil { + if cache == nil { return } c.mu.Lock() defer c.mu.Unlock() + if !c.enabled { + return + } c.queriesForFeatureFlag[featureFlagKey] = cache } func (c *InMemorySwitchOverCache) deletePlanCacheForFF(featureFlagKey string) { - if !c.enabled { - return - } - c.mu.Lock() defer c.mu.Unlock() + if !c.enabled { + return + } delete(c.queriesForFeatureFlag, featureFlagKey) }
128-154: Same data race pattern incleanupUnusedFeatureFlags.The
enabledcheck at line 129 occurs before acquiring the lock at line 137.Proposed fix
func (c *InMemorySwitchOverCache) cleanupUnusedFeatureFlags(routerCfg *nodev1.RouterConfig) { - if !c.enabled { - return - } - if routerCfg.FeatureFlagConfigs == nil { return } c.mu.Lock() defer c.mu.Unlock() + if !c.enabled { + return + } ffNames := make(map[string]struct{})
🧹 Nitpick comments (3)
router-tests/cache_warmup_test.go (1)
956-958: Avoid indefinite hangs waiting onConfigPollerMock.ready.Blocking on
<-pm.readywithout a timeout can hang CI if the poller never signals. A small helper with a timeout keeps tests fail-fast and debuggable.✅ Safer wait helper
func TestInMemorySwitchoverCaching(t *testing.T) { t.Parallel() + + waitForPollerReady := func(t *testing.T, ch <-chan struct{}) { + t.Helper() + select { + case <-ch: + case <-time.After(2 * time.Second): + t.Fatal("config poller did not become ready in time") + } + } t.Run("Verify the plan is cached on config restart when in memory switchover is enabled", func(t *testing.T) { t.Parallel() @@ - <-pm.ready + waitForPollerReady(t, pm.ready) @@ - <-pm.ready + waitForPollerReady(t, pm.ready) @@ - <-pm.ready + waitForPollerReady(t, pm.ready)Also applies to: 1001-1003, 1045-1047
router/core/restart_switchover_config.go (1)
156-165: Consider nil-check for safety.
convertToNodeOperationdoesn't check ifdatais nil before callingIterValues. While callers should ensure non-nil values, a defensive nil-check would prevent panics from unexpected states.Suggested improvement
func convertToNodeOperation(data planCache) []*nodev1.Operation { + if data == nil { + return make([]*nodev1.Operation, 0) + } items := make([]*nodev1.Operation, 0) data.IterValues(func(v *planWithMetaData) (stop bool) {router/core/graph_server.go (1)
1310-1311: Potential unsynchronized read ofenabledfield.This directly accesses
opts.SwitchoverConfig.inMemorySwitchOverCache.enabledwithout holding the lock. While this likely occurs during initialization when no concurrent updates are expected, it's inconsistent with the mutex-protected access pattern used elsewhere.Consider exposing a synchronized getter method on
SwitchoverConfigorInMemorySwitchOverCacheto ensure thread-safe access.Suggested approach
Add a synchronized getter to
InMemorySwitchOverCache:func (c *InMemorySwitchOverCache) isEnabled() bool { c.mu.RLock() defer c.mu.RUnlock() return c.enabled }Then update the call:
-operationPlanner := NewOperationPlanner(executor, gm.planCache, opts.SwitchoverConfig.inMemorySwitchOverCache.enabled) +operationPlanner := NewOperationPlanner(executor, gm.planCache, opts.SwitchoverConfig.inMemorySwitchOverCache.isEnabled())
…ement-cache-warmer-from-existing-caches
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@router/core/restart_switchover_config.go`:
- Around line 54-72: The write to InMemorySwitchOverCache.enabled is racy;
change enabled to an atomic flag (e.g., sync/atomic or atomic.Bool) and replace
all reads/writes to use atomic Load/Store semantics: update the assignment in
updateStateFromConfig to use c.enabled.Store(newVal) and update the reader
methods getPlanCacheForFF, setPlanCacheForFF, processOnConfigChangeRestart, and
cleanupUnusedFeatureFlags to use c.enabled.Load() instead of direct field access
(or alternatively move both the write and any pre-check reads inside the
existing mu lock); this ensures no TOCTOU race when checking or updating
enabled.
♻️ Duplicate comments (1)
router/core/restart_switchover_config.go (1)
74-95: Data race:enabledread before lock acquisition.The check at line 75 reads
c.enabledbefore the RLock is acquired at line 79. IfupdateStateFromConfigruns concurrently and modifiesenabled, this method may see an inconsistent state. The same pattern exists insetPlanCacheForFF(line 98),processOnConfigChangeRestart(line 108), andcleanupUnusedFeatureFlags(line 125).Proposed fix: move enabled check inside the lock
func (c *InMemorySwitchOverCache) getPlanCacheForFF(featureFlagKey string) []*nodev1.Operation { - if !c.enabled { - return nil - } - c.mu.RLock() defer c.mu.RUnlock() + if !c.enabled { + return nil + } switch cache := c.queriesForFeatureFlag[featureFlagKey].(type) {Apply the same pattern to
setPlanCacheForFF,processOnConfigChangeRestart, andcleanupUnusedFeatureFlags.
🧹 Nitpick comments (1)
router/core/restart_switchover_config_test.go (1)
373-457: Consider adding a concurrency test.The tests cover functional behavior well, but given the data race concerns in the implementation, consider adding a test that exercises concurrent access to validate thread safety once the locking issues are addressed.
Example concurrency test
func TestInMemorySwitchOverCache_ConcurrentAccess(t *testing.T) { t.Parallel() cache := &InMemorySwitchOverCache{ enabled: true, queriesForFeatureFlag: make(map[string]any), logger: zap.NewNop(), } var wg sync.WaitGroup // Concurrent reads for i := 0; i < 10; i++ { wg.Go(func() { _ = cache.getPlanCacheForFF("test-ff") }) } // Concurrent writes for i := 0; i < 10; i++ { wg.Go(func() { cache.updateStateFromConfig(&Config{ cacheWarmup: &config.CacheWarmupConfiguration{ Enabled: true, InMemorySwitchoverFallback: true, }, }, false) }) } wg.Wait() }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
router/core/graph_server.go (1)
295-461: PropagateCosmoCacheWarmerEnabledto feature-flag muxes.Line 456 builds feature-flag muxes without setting
CosmoCacheWarmerEnabled, so the warmup switch at Line 1367 can incorrectly choose in-memory fallback even when CDN warmup is enabled. This changes behavior for feature-flag caches compared to the base graph.✅ Suggested fix
- multiGraphHandler, err := s.buildMultiGraphHandler(ctx, gm.mux, featureFlagConfigMap, switchoverConfig) + multiGraphHandler, err := s.buildMultiGraphHandler(ctx, gm.mux, featureFlagConfigMap, switchoverConfig, cosmoCacheWarmerEnabled)-func (s *graphServer) buildMultiGraphHandler( +func (s *graphServer) buildMultiGraphHandler( ctx context.Context, baseMux *chi.Mux, featureFlagConfigs map[string]*nodev1.FeatureFlagRouterExecutionConfig, - switchoverConfig *SwitchoverConfig, + switchoverConfig *SwitchoverConfig, + cosmoCacheWarmerEnabled bool, ) (http.HandlerFunc, error) {gm, err := s.buildGraphMux(ctx, BuildGraphMuxOptions{ FeatureFlagName: featureFlagName, RouterConfigVersion: executionConfig.GetVersion(), EngineConfig: executionConfig.GetEngineConfig(), ConfigSubgraphs: executionConfig.Subgraphs, SwitchoverConfig: switchoverConfig, + CosmoCacheWarmerEnabled: cosmoCacheWarmerEnabled, })
♻️ Duplicate comments (1)
router/core/restart_switchover_config.go (1)
110-124: Preserve already-converted entries on repeated restart processing.Line 119 only copies entries that are still
planCache. IfprocessOnConfigChangeRestartis called again after a prior conversion, existing[]*nodev1.Operationentries are silently dropped, resulting in an empty warmup set.✅ Suggested fix
- for k, v := range c.queriesForFeatureFlag { - if cache, ok := v.(planCache); ok { - switchoverMap[k] = convertToNodeOperation(cache) - } - } + for k, v := range c.queriesForFeatureFlag { + switch cache := v.(type) { + case planCache: + switchoverMap[k] = convertToNodeOperation(cache) + case []*nodev1.Operation: + // Already converted + switchoverMap[k] = cache + default: + c.logger.Error("unexpected type") + } + }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@router/core/graph_server.go`:
- Around line 1313-1314: The check for switchover enabled reads shared state
without synchronization causing a data race; add a helper method on
InMemorySwitchOverCache (e.g., isEnabled()) that takes an RLock, nil-guards
queriesForFeatureFlag and returns whether it is non-nil, then replace the direct
access in graph_server.go (the line setting storeQueryContent using
opts.SwitchoverConfig.inMemorySwitchOverCache) with a call to that helper (also
nil-guard SwitchoverConfig itself before calling), so updateStateFromConfig’s
mutations and this read are synchronized via the cache’s mutex.
In `@router/core/restart_switchover_config.go`:
- Around line 148-154: convertToNodeOperation may miss buffered writes because
IterValues is non-atomic; call the cache's Wait() to flush pending asynchronous
sets before iterating. In convertToNodeOperation (which receives data planCache)
invoke the appropriate Wait() method on the planCache (or its underlying
Ristretto cache field) immediately before data.IterValues(...) so pending writes
are flushed and iteration sees a consistent snapshot. Ensure you use the same
Wait() approach as the tests (or implement an explicit snapshot method) to avoid
missing entries.
♻️ Duplicate comments (1)
router/core/restart_switchover_config.go (1)
115-121: Preserve already-converted switchover entries on repeated restarts.On a second config-change restart,
queriesForFeatureFlagwill already contain[]*nodev1.Operation. The current loop only keepsplanCachevalues, silently dropping converted entries and losing warmup candidates. This defeats the in-memory warmup on successive restarts.🔧 Proposed fix (keep already-converted entries)
for k, v := range c.queriesForFeatureFlag { - if cache, ok := v.(planCache); ok { - switchoverMap[k] = convertToNodeOperation(cache) - } + switch cache := v.(type) { + case planCache: + switchoverMap[k] = convertToNodeOperation(cache) + case []*nodev1.Operation: + switchoverMap[k] = cache + default: + c.logger.Error("unexpected type") + } }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@router/core/restart_switchover_config.go`:
- Around line 113-128: The extractQueriesAndOverridePlanCache method currently
only preserves entries of type planCache and drops already-converted operation
slices; update the loop in
InMemorySwitchOverCache.extractQueriesAndOverridePlanCache to handle both types
by checking the value's type (e.g., planCache and []*nodev1.Operation or []
*nodev1.Operation) and copying existing operation slices into switchoverMap
unchanged while converting planCache entries via convertToNodeOperation(cache);
ensure you preserve other non-matching entries as-is or ignore explicitly
depending on intended semantics so repeated calls do not lose
previously-converted data.
♻️ Duplicate comments (1)
router/core/restart_switchover_config.go (1)
18-24: Add documentation for exported types and functions.The exported
SwitchoverConfigtype has a brief comment, but the exported functions (NewSwitchoverConfig,UpdateSwitchoverConfig,CleanupFeatureFlags,ProcessOnConfigChangeRouterInstanceRestart) lack godoc comments describing their purpose and usage.
🧹 Nitpick comments (2)
router/core/supervisor.go (1)
160-163: Consider adding a nil check for defensive coding.If
switchoverConfigis ever nil (e.g., in tests or alternative router factory implementations), this will panic. While the current wiring always provides aSwitchoverConfig, a nil guard would make this more robust.🛡️ Suggested defensive check
if !shutdown { + if rs.router.switchoverConfig != nil { rs.router.switchoverConfig.ProcessOnConfigChangeRouterInstanceRestart() + } }router/core/restart_switchover_config_test.go (1)
340-413: Add test case for callingextractQueriesAndOverridePlanCachetwice.The current tests only verify the first conversion. A test that calls the method twice would verify that already-converted
[]*nodev1.Operationentries are preserved (which currently they're not—see related issue in production code).🧪 Suggested additional test case
t.Run("preserves already-converted operation slices on second call", func(t *testing.T) { t.Parallel() mockCache, err := ristretto.NewCache(&ristretto.Config[uint64, *planWithMetaData]{ MaxCost: 10000, NumCounters: 10000000, IgnoreInternalCost: true, BufferItems: 64, }) require.NoError(t, err) query1 := "query { test1 }" mockCache.Set(1, &planWithMetaData{content: query1}, 1) mockCache.Wait() // Pre-existing converted operations existingOps := []*nodev1.Operation{ {Request: &nodev1.OperationRequest{Query: "query { existing }"}}, } cache := &InMemorySwitchOverCache{ queriesForFeatureFlag: make(map[string]any), } cache.queriesForFeatureFlag["ff1"] = mockCache cache.queriesForFeatureFlag["ff2"] = existingOps cache.extractQueriesAndOverridePlanCache() // Both should be preserved require.Len(t, cache.queriesForFeatureFlag, 2) require.IsType(t, []*nodev1.Operation{}, cache.queriesForFeatureFlag["ff1"]) require.IsType(t, []*nodev1.Operation{}, cache.queriesForFeatureFlag["ff2"]) ff2Ops := cache.queriesForFeatureFlag["ff2"].([]*nodev1.Operation) require.Equal(t, "query { existing }", ff2Ops[0].Request.Query) })
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
router/core/graph_server.go (1)
1373-1394: Potential nil pointer dereference onSwitchoverConfigin in-memory switchover fallback case.Lines 1386-1387 access
opts.SwitchoverConfig.inMemorySwitchOverCachewithout verifying thatopts.SwitchoverConfigis non-nil. If the configuration enablesInMemorySwitchoverFallbackbutSwitchoverConfigwas not initialized (e.g., due to a configuration mismatch), this will cause a panic.🐛 Proposed fix: Add nil check to the switch case condition
- case s.cacheWarmup.InMemorySwitchoverFallback && (s.selfRegister == nil || !opts.CosmoCacheWarmerEnabled): + case s.cacheWarmup.InMemorySwitchoverFallback && opts.SwitchoverConfig != nil && (s.selfRegister == nil || !opts.CosmoCacheWarmerEnabled):
🤖 Fix all issues with AI agents
In `@router/core/graph_server.go`:
- Around line 1322-1323: The call to
opts.SwitchoverConfig.inMemorySwitchOverCache.IsEnabled() can panic if
SwitchoverConfig is nil; change the construction of operationPlanner
(NewOperationPlanner(executor, gm.planCache, ...)) to compute the boolean safely
by first checking if opts.SwitchoverConfig != nil and only then calling
inMemorySwitchOverCache.IsEnabled(), otherwise default to false, and pass that
computed value into NewOperationPlanner to avoid a nil pointer dereference.
Summary by CodeRabbit
New Features
Configuration
Behavioral
API
Tests
Chores
✏️ Tip: You can customize this high-level summary in your review settings.
Description
This PR introduces a new mechanism to warm the cache on restarts due to schema changes (or whenever the router execution config changes). Config to enable this
If in_memory_switchover is not present or false, we default to the current cdn cache warmer (with cosmo), this preserves backwards compatibility.
This is achieved by first adding a new
switchoverConfigwhich can keep references across restarts, we use this to store a reference to the planner cache. When the router restarts due to an execution config change, we then loop through the elements in the ristretto planner cache.Right now ristretto does not support iterating, so we have opened a PR here dgraph-io/ristretto#475. For this PR (which we need to discuss before merging) we have pointed ristretto to the fork from where the PR was opened.
We create a new provider for the cache warmer, for
in_memory_switcherwhich extracts the queries from the planner cache upon startup. Note that with this method there is no cache warming upon startup, only on subsequent restarts due to execution config changes.We also support persisting across restarts on hot config reloading. However we ran into a problem of shutdown occuring first before the new graph mux was created (which is the opposite for schema change updates), which means that the current planner cache gets emptied before we can extract the queries from it. As a solution we preemptively extract the queries from the planner cache before its closed whenever a hot config reloading occurs.
This PR also changes how cache warmer is enabled when used with the cosmo CDN, previously it requested the manifest from the CDN if the cache warmer was enabled in the router
config.yaml, this meant that ex-cache warmer customers could still access old manifest files fron the cdn (as long as it was present), however now its only enabled if the feature is enabled in cosmo (upon startup we get if cache warmer is enabled from cosmo).Checklist