Skip to content

Conversation

@devsergiy
Copy link
Member

@devsergiy devsergiy commented Nov 5, 2025

Use cache for variables normalization and variables remapping.

Closes ENG-8477

Summary by CodeRabbit

  • Performance

    • Added dedicated caches for variable normalization and variable remapping to speed request processing.
    • New debug response headers reporting variable-normalization and variable-remapping HIT/MISS.
    • Cache-hit info is recorded in request telemetry and exposed via new cache-specific metrics for better observability.
  • Tests

    • Added comprehensive tests validating normalization/remapping cache behavior, cache-hit headers, telemetry, and response correctness across varied scenarios.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Nov 5, 2025

Walkthrough

Adds dedicated caches for variables normalization and variables remapping, wires them into the graph server and operation processor, changes NormalizeVariables/RemapVariables to use/read/write caches and return cache-hit flags, surfaces those hits via headers and OTel attributes, updates call sites and tests, and removes operationID from the old normalization cache entry.

Changes

Cohort / File(s) Summary
Tests
router-tests/normalization_cache_test.go, router-tests/telemetry/telemetry_test.go
Adds cacheHit type, assertCacheHeaders helper, and TestVarsNormalizationRemappingCaches with parallel subtests asserting normalization/variables-normalization/remapping cache HIT/MISS headers and response bodies; expands telemetry tests to expect variables_normalization and remap_variables cache datapoints and span attributes.
Graph server & cache lifecycle
router/core/graph_server.go
Adds variablesNormalizationCache and remapVariablesCache to GraphMux; initializes them in buildOperationCaches, exposes via OperationProcessorOptions, emits metrics, and closes them in Shutdown.
Operation processor & cache types
router/core/operation_processor.go, router/core/operation_processor_test.go
Removes operationID from prior normalization cache entry; adds VariablesNormalizationCacheEntry and RemapVariablesCacheEntry; extends OperationProcessorOptions and OperationCache with new caches; adds cache-key helpers; changes NormalizeVariables and RemapVariables signatures to return cached flags and read/write the new caches; updates tests to match new return arities.
Call-site updates / warmup
router/core/cache_warmup.go, router/core/websocket.go
Updates callers to new return arities: NormalizeVariables now returns three values, RemapVariables returns two; captures cache-hit booleans and adjusts error handling and flow accordingly.
Request context & prehandler
router/core/context.go, router/core/graphql_prehandler.go
Adds variablesNormalizationCacheHit and variablesRemappingCacheHit to operationContext; records cache-hit booleans on spans and stores them in the request/operation context; preserves uploads path remapping logic while propagating cache-hit state.
HTTP headers & telemetry attributes
router/core/graphql_handler.go, router/pkg/otel/attributes.go
Adds VariablesNormalizationCacheHeader and VariablesRemappingCacheHeader; refactors debug header setting to unify HIT/MISS formatting; adds OTel attribute keys WgVariablesNormalizationCacheHit and WgVariablesRemappingCacheHit.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

  • Areas needing extra attention:
    • All changed signatures for NormalizeVariables and RemapVariables and every call site update.
    • Removal of operationID from the previous normalization cache entry and its implications for persisted-operation paths.
    • Correctness and collision risk of new cache-key generation (normalizeVariablesCacheKey, remapVariablesCacheKey).
    • Concurrency, initialization, sizing, and safe shutdown of the new ristretto caches in GraphMux.
    • Consistency between cache-hit flags, emitted HTTP headers, span attributes, and updated tests that assert those headers/attributes.

Possibly related PRs

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 42.86% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix: cache vars and remapping normalization' directly matches the PR's main objective of implementing caching for variables normalization and remapping, as confirmed by the detailed changes across multiple files.
✨ Finishing touches
  • 📝 Generate docstrings

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9e0a7ac and 9121581.

📒 Files selected for processing (1)
  • router/core/operation_processor_test.go (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-01T20:39:16.113Z
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2252
File: router-tests/telemetry/telemetry_test.go:9684-9693
Timestamp: 2025-10-01T20:39:16.113Z
Learning: Repo preference: In router-tests/telemetry/telemetry_test.go, keep strict > 0 assertions for request.operation.*Time (parsingTime, normalizationTime, validationTime, planningTime) in telemetry-related tests; do not relax to >= 0 unless CI flakiness is observed.

Applied to files:

  • router/core/operation_processor_test.go
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
  • GitHub Check: image_scan
  • GitHub Check: build_push_image (nonroot)
  • GitHub Check: image_scan (nonroot)
  • GitHub Check: build_push_image
  • GitHub Check: integration_test (./telemetry)
  • GitHub Check: build_test
  • GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)
  • GitHub Check: integration_test (./events)
  • GitHub Check: build_test
  • GitHub Check: Analyze (go)
  • GitHub Check: Analyze (javascript-typescript)
🔇 Additional comments (1)
router/core/operation_processor_test.go (1)

261-266: NormalizeVariables call correctly updated for new triple-return signature

Using two blank identifiers and binding only err aligns with the updated NormalizeVariables API and keeps this test focused on the behavior it actually asserts (normalized representation and variables). No further changes needed here.

Tip

📝 Customizable high-level summaries are now available in beta!

You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.

  • Provide your own instructions using the high_level_summary_instructions setting.
  • Format the summary however you like (bullet lists, tables, multi-section layouts, contributor stats, etc.).
  • Use high_level_summary_in_walkthrough to move the summary from the description to the walkthrough section.

Example instruction:

"Divide the high-level summary into five sections:

  1. 📝 Description — Summarize the main change in 50–60 words, explaining what was done.
  2. 📓 References — List relevant issues, discussions, documentation, or related PRs.
  3. 📦 Dependencies & Requirements — Mention any new/updated dependencies, environment variable changes, or configuration updates.
  4. 📊 Contributor Summary — Include a Markdown table showing contributions:
    | Contributor | Lines Added | Lines Removed | Files Changed |
  5. ✔️ Additional Notes — Add any extra reviewer context.
    Keep each section concise (under 200 words) and use bullet or numbered lists for clarity."

Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later.


Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link

github-actions bot commented Nov 5, 2025

Router-nonroot image scan passed

✅ No security vulnerabilities found in image:

ghcr.io/wundergraph/cosmo/router:sha-42c22c85ba23818195da816e632223dba7407bc9-nonroot

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
router/core/operation_processor.go (1)

944-979: Copy cached variables before storing entries

VariablesNormalizationCacheEntry.variables currently aliases o.parsedOperation.Request.Variables, which is backed by the pooled parseKit buffer. After the entry is cached, the kit gets reused (OperationProcessor.freeKit), so later requests mutate that shared backing array, corrupting the cached payload and any future hits.

Please copy the slice before caching so the entry owns its data. Apply the same fix in both places where we construct VariablesNormalizationCacheEntry.

@@
-		entry := VariablesNormalizationCacheEntry{
+		cachedVariables := append([]byte(nil), o.parsedOperation.Request.Variables...)
+		entry := VariablesNormalizationCacheEntry{
 			uploadsMapping:           uploadsMapping,
 			id:                       o.parsedOperation.ID,
 			normalizedRepresentation: o.parsedOperation.NormalizedRepresentation,
-			variables:                o.parsedOperation.Request.Variables,
+			variables:                cachedVariables,
 			reparse:                  false,
 		}
@@
-		entry := VariablesNormalizationCacheEntry{
+		cachedVariables := append([]byte(nil), o.parsedOperation.Request.Variables...)
+		entry := VariablesNormalizationCacheEntry{
 			uploadsMapping:           uploadsMapping,
 			id:                       o.parsedOperation.ID,
 			normalizedRepresentation: o.parsedOperation.NormalizedRepresentation,
-			variables:                o.parsedOperation.Request.Variables,
+			variables:                cachedVariables,
 			reparse:                  true,
 		}
🧹 Nitpick comments (1)
router/core/graph_server.go (1)

707-731: Expose cache metrics for the new caches

The new variables/remap caches are now part of the mux, but we never register them with metricInfos, so their hit/miss stats stay invisible in OTLP/Prometheus. Please append both caches to metricInfos alongside the existing ones so operators can monitor them.

@@
 	if s.normalizationCache != nil {
 		metricInfos = append(metricInfos, rmetric.NewCacheMetricInfo("query_normalization", srv.engineExecutionConfiguration.NormalizationCacheSize, s.normalizationCache.Metrics))
 	}
 
+	if s.variablesNormalizationCache != nil {
+		metricInfos = append(metricInfos, rmetric.NewCacheMetricInfo("variables_normalization", srv.engineExecutionConfiguration.NormalizationCacheSize, s.variablesNormalizationCache.Metrics))
+	}
+
+	if s.remapVariablesCache != nil {
+		metricInfos = append(metricInfos, rmetric.NewCacheMetricInfo("variables_remap", srv.engineExecutionConfiguration.NormalizationCacheSize, s.remapVariablesCache.Metrics))
+	}
+
 	if s.persistedOperationCache != nil {
 		metricInfos = append(metricInfos, rmetric.NewCacheMetricInfo("persisted_query_normalization", 1024, s.persistedOperationCache.Metrics))
 	}
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f865487 and 5096314.

📒 Files selected for processing (3)
  • router-tests/normalization_cache_test.go (1 hunks)
  • router/core/graph_server.go (4 hunks)
  • router/core/operation_processor.go (10 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-10-01T20:39:16.113Z
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2252
File: router-tests/telemetry/telemetry_test.go:9684-9693
Timestamp: 2025-10-01T20:39:16.113Z
Learning: Repo preference: In router-tests/telemetry/telemetry_test.go, keep strict > 0 assertions for request.operation.*Time (parsingTime, normalizationTime, validationTime, planningTime) in telemetry-related tests; do not relax to >= 0 unless CI flakiness is observed.

Applied to files:

  • router-tests/normalization_cache_test.go
📚 Learning: 2025-09-17T20:55:39.456Z
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2172
File: router/core/graph_server.go:0-0
Timestamp: 2025-09-17T20:55:39.456Z
Learning: The Initialize method in router/internal/retrytransport/manager.go has been updated to properly handle feature-flag-only subgraphs by collecting subgraphs from both routerConfig.GetSubgraphs() and routerConfig.FeatureFlagConfigs.ConfigByFeatureFlagName, ensuring all subgraphs receive retry configuration.

Applied to files:

  • router/core/graph_server.go
🧬 Code graph analysis (2)
router-tests/normalization_cache_test.go (2)
router-tests/testenv/testenv.go (4)
  • Run (105-122)
  • Config (284-341)
  • Environment (1731-1767)
  • GraphQLRequest (1907-1915)
router/core/graphql_handler.go (1)
  • NormalizationCacheHeader (39-39)
router/core/graph_server.go (1)
router/core/operation_processor.go (3)
  • NormalizationCacheEntry (755-759)
  • VariablesNormalizationCacheEntry (761-767)
  • RemapVariablesCacheEntry (769-773)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
  • GitHub Check: build-router
  • GitHub Check: build_test
  • GitHub Check: integration_test (./events)
  • GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)
  • GitHub Check: image_scan
  • GitHub Check: integration_test (./telemetry)
  • GitHub Check: image_scan (nonroot)
  • GitHub Check: build_push_image
  • GitHub Check: build_push_image (nonroot)
  • GitHub Check: build_test
  • GitHub Check: Analyze (javascript-typescript)
  • GitHub Check: Analyze (go)

@ysmolski ysmolski self-assigned this Nov 11, 2025
@ysmolski ysmolski changed the title fix: normalization cache fix: cache vars and remapping normalization Nov 13, 2025
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
router/core/operation_processor.go (2)

813-817: Consider handling or documenting jsonparser.Delete errors.

The jsonparser.Delete calls on Line 816 can return errors, but they're being silently ignored. While delete operations on valid JSON should rarely fail, consider either:

  1. Handling the error (log or return it)
  2. Adding a comment explaining why it's safe to ignore

Example:

 for _, varName := range skipIncludeVariableNames {
-	o.parsedOperation.Request.Variables = jsonparser.Delete(o.parsedOperation.Request.Variables, varName)
+	var err error
+	o.parsedOperation.Request.Variables, err = jsonparser.Delete(o.parsedOperation.Request.Variables, varName)
+	if err != nil {
+		// Log or handle error
+	}
 }

943-950: Address or remove the TODO-style comment.

The comment on Line 943 suggests uncertainty about keyGen reset discipline: "should not be needed if we properly reset after use - check do we have any remaining places where we do not reset keygen - maybe wrap into a type which will reset once we got key".

After reviewing the code, the resets appear properly placed (lines 885, 950, 995, 1045, 1053, 1193, 1202). Consider either:

  1. Removing the comment if the concern has been addressed
  2. Creating a wrapper type as suggested for better encapsulation
  3. Converting this to a proper TODO with a tracking ticket
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a2bf4a4 and dcbf441.

📒 Files selected for processing (1)
  • router/core/operation_processor.go (12 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
router/core/operation_processor.go (1)
router-tests/testenv/testenv.go (1)
  • GraphQLRequest (1909-1917)
🔇 Additional comments (7)
router/core/operation_processor.go (7)

48-78: LGTM! Documentation improvements.

The enhanced comments provide clearer explanations of the field purposes and when they're available.


106-128: LGTM! Cache fields properly added.

The new cache fields follow the established naming conventions and type patterns used for other caches in the struct.


163-176: LGTM! Cache fields properly integrated.

The cache fields are correctly added to the OperationCache struct following existing patterns.


767-794: LGTM! Well-documented cache entry types.

The cache entry structures are well-designed with clear documentation. The reparse flag in VariablesNormalizationCacheEntry is a smart optimization to avoid unnecessary document reparsing when the normalized form hasn't changed.


889-990: LGTM! Well-implemented caching logic with smart optimization.

The caching implementation is solid:

  • Cache key generation appropriately uses both variables and normalized representation
  • The reparse flag optimization avoids unnecessary document reparsing when the normalized form is unchanged (lines 954-963)
  • Proper error handling and cache miss fallback
  • Upload mappings are correctly preserved in cache entries

992-1073: LGTM! RemapVariables caching correctly implemented.

The caching logic is well-implemented:

  • Cache key appropriately uses only the normalized representation (line 993), since variable remapping depends on operation structure, not input variable values
  • Correct handling of cache hits with document reparsing (lines 1008-1010)
  • Proper error handling and fallback
  • keyGen properly reset after each use

1432-1441: LGTM! Cache initialization properly wired.

The new caches are correctly integrated into the NewOperationProcessor initialization:

  • Condition check includes the new cache types (lines 1432-1433)
  • Fields properly assigned from options (lines 1439-1440)
  • Follows the established pattern for other caches

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
router/core/operation_processor.go (1)

944-944: Consider removing the TODO-style comment.

The comment suggests checking whether keyGen is properly reset after use, but inspection shows keyGen.Reset() is consistently called after each use throughout the file (lines 885, 951, 996, 1047, 1055). This appears to be a leftover development note that could be removed for code clarity.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dcbf441 and 11e6116.

📒 Files selected for processing (2)
  • router/core/graphql_prehandler.go (3 hunks)
  • router/core/operation_processor.go (12 hunks)
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2025-09-02T12:52:27.677Z
Learnt from: endigma
Repo: wundergraph/cosmo PR: 2181
File: router/core/operation_processor.go:0-0
Timestamp: 2025-09-02T12:52:27.677Z
Learning: Hash validation for persisted queries with query bodies is performed in router/core/graphql_prehandler.go in the handleOperation function, not in the APQ processing logic in operation_processor.go. This validates that extensions.persistedQuery.sha256Hash matches the query body before any APQ operations occur.

Applied to files:

  • router/core/graphql_prehandler.go
📚 Learning: 2025-09-02T12:52:27.677Z
Learnt from: endigma
Repo: wundergraph/cosmo PR: 2181
File: router/core/operation_processor.go:0-0
Timestamp: 2025-09-02T12:52:27.677Z
Learning: Hash validation for persisted queries with query bodies is performed in router/core/graphql_prehandler.go in the handleOperation function at lines 571-578, not in the APQ processing logic in operation_processor.go. This validates that extensions.persistedQuery.sha256Hash matches the computed query hash before any APQ operations occur. There's also a test case that verifies this behavior.

Applied to files:

  • router/core/graphql_prehandler.go
📚 Learning: 2025-10-01T20:39:16.113Z
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2252
File: router-tests/telemetry/telemetry_test.go:9684-9693
Timestamp: 2025-10-01T20:39:16.113Z
Learning: Repo preference: In router-tests/telemetry/telemetry_test.go, keep strict > 0 assertions for request.operation.*Time (parsingTime, normalizationTime, validationTime, planningTime) in telemetry-related tests; do not relax to >= 0 unless CI flakiness is observed.

Applied to files:

  • router/core/graphql_prehandler.go
🧬 Code graph analysis (2)
router/core/graphql_prehandler.go (1)
router/pkg/otel/attributes.go (2)
  • WgVariablesNormalizationCacheHit (39-39)
  • WgVariablesRemappingCacheHit (40-40)
router/core/operation_processor.go (1)
router-tests/testenv/testenv.go (1)
  • GraphQLRequest (1909-1917)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
  • GitHub Check: Analyze (javascript-typescript)
  • GitHub Check: Analyze (go)
  • GitHub Check: build_test
  • GitHub Check: image_scan (nonroot)
  • GitHub Check: image_scan
  • GitHub Check: integration_test (./events)
  • GitHub Check: build_push_image
  • GitHub Check: build_push_image (nonroot)
  • GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)
  • GitHub Check: integration_test (./telemetry)
  • GitHub Check: build_test
🔇 Additional comments (8)
router/core/operation_processor.go (5)

767-794: Well-documented cache entry types.

The new cache entries are clearly documented with helpful comments explaining each field's purpose. The reparse flag in VariablesNormalizationCacheEntry is a smart optimization that avoids unnecessary document reparsing when the normalized form hasn't changed.

Based on learnings: This addresses a past review comment requesting comments and descriptions for these structures.


812-817: Correct handling of skip/include variables after cache hit.

This logic properly removes skip/include variables when retrieving a cached normalized operation, ensuring consistency with the non-cached path where these variables are removed during normalization.


880-991: Solid caching implementation for variable normalization.

The cache key appropriately includes both variables and normalized representation since normalization depends on both. The reparse flag optimization is well-designed: when the normalized form doesn't change, it avoids the expensive setAndParseOperationDoc() call on cache retrieval.


993-1075: Appropriate caching implementation for variable remapping.

The cache key correctly includes only the normalized representation (not variable values) since remapping depends on operation structure. Unlike NormalizeVariables, this always calls setAndParseOperationDoc() after a cache hit, which is appropriate since remapping modifies the operation structure.


1434-1443: Proper wiring of new caches into OperationProcessor.

The new caches are correctly integrated following the existing pattern for other cache types.

router/core/graphql_prehandler.go (3)

786-786: Correct integration of variables normalization cache.

The cached boolean is properly captured from NormalizeVariables and propagated to both the OTEL span and the request context for telemetry tracking.

Also applies to: 801-802


804-831: Excellent documentation for file upload path mapping.

The enhanced comments with concrete examples make the complex upload path remapping logic much clearer. The distinction between direct argument passing vs. nested object usage is now well-explained.


833-836: Correct integration of variable remapping cache.

The cached boolean is properly captured and propagated to telemetry. The updated comment at lines 833-835 helpfully clarifies the key-value relationship in the remapVariables map (new names as keys, old names as values).

Also applies to: 852-853

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (4)
router-tests/normalization_cache_test.go (4)

87-106: Clarify or remove misleading comment.

The comment on line 90 states "Test with unused variables that should be removed," but the test doesn't include any unused variables. The $id variable is used in both queries. Consider removing this comment or clarifying what aspect of variable handling is being tested.


108-131: Add t.Parallel() for consistency.

This subtest doesn't call t.Parallel(), unlike the other subtests in this test function. For consistency and to improve test execution performance, consider adding it unless there's a specific reason for sequential execution.

Apply this diff:

 	t.Run("Cache key isolation - different operations don't collide", func(t *testing.T) {
+		t.Parallel()
 		testenv.Run(t, &testenv.Config{}, func(t *testing.T, xEnv *testenv.Environment) {

64-85: Consider adding explanatory comments for cache hit patterns.

Line 83 expects cacheHit{false, false, true} (normalization MISS, variables MISS, remapping HIT) when the inline value changes from id: 1 to id: 2. While the logic is correct, a brief comment explaining why remapping hits while the others miss would improve readability and make the test's intent clearer for future maintainers.


133-151: Consider adding explanatory comments for cache hit patterns.

Line 149 expects cacheHit{true, false, true} (normalization HIT, variables MISS, remapping HIT) when the variable value changes. A brief comment explaining this pattern would help clarify that normalization hits because the query structure is unchanged, variables misses because the value differs, and remapping hits because the structure remains the same.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 11e6116 and ca894b4.

📒 Files selected for processing (1)
  • router-tests/normalization_cache_test.go (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-01T20:39:16.113Z
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2252
File: router-tests/telemetry/telemetry_test.go:9684-9693
Timestamp: 2025-10-01T20:39:16.113Z
Learning: Repo preference: In router-tests/telemetry/telemetry_test.go, keep strict > 0 assertions for request.operation.*Time (parsingTime, normalizationTime, validationTime, planningTime) in telemetry-related tests; do not relax to >= 0 unless CI flakiness is observed.

Applied to files:

  • router-tests/normalization_cache_test.go
🧬 Code graph analysis (1)
router-tests/normalization_cache_test.go (2)
router-tests/testenv/testenv.go (2)
  • TestResponse (1924-1928)
  • GraphQLRequest (1909-1917)
router/core/graphql_handler.go (3)
  • NormalizationCacheHeader (39-39)
  • VariablesNormalizationCacheHeader (40-40)
  • VariablesRemappingCacheHeader (41-41)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: image_scan (nonroot)
  • GitHub Check: image_scan
  • GitHub Check: integration_test (./events)
  • GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)
  • GitHub Check: build_push_image
  • GitHub Check: build_push_image (nonroot)
  • GitHub Check: integration_test (./telemetry)
  • GitHub Check: build_test
  • GitHub Check: Analyze (javascript-typescript)
  • GitHub Check: Analyze (go)
🔇 Additional comments (1)
router-tests/normalization_cache_test.go (1)

22-38: LGTM! Well-designed test helper.

The assertCacheHeaders helper effectively reduces duplication and provides clear, specific error messages for each cache stage. Good use of t.Helper() for accurate test failure reporting.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
router/core/operation_processor.go (1)

1430-1439: Cache wiring is correct; consider extracting condition for readability.

The new caches are properly wired into the OperationCache. The conditional on lines 1430-1431 checks all six cache options, which is functionally correct.

The multi-condition check could be extracted for improved readability:

+func (opts *OperationProcessorOptions) hasCaching() bool {
+	return opts.NormalizationCache != nil || opts.ValidationCache != nil || 
+		opts.QueryDepthCache != nil || opts.OperationHashCache != nil || 
+		opts.EnablePersistedOperationsCache || opts.VariablesNormalizationCache != nil || 
+		opts.RemapVariablesCache != nil
+}
+
-	if opts.NormalizationCache != nil || opts.ValidationCache != nil || opts.QueryDepthCache != nil || opts.OperationHashCache != nil || opts.EnablePersistedOperationsCache ||
-		opts.VariablesNormalizationCache != nil || opts.RemapVariablesCache != nil {
+	if opts.hasCaching() {
		processor.operationCache = &OperationCache{
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ca894b4 and 47893d8.

📒 Files selected for processing (2)
  • router-tests/normalization_cache_test.go (1 hunks)
  • router/core/operation_processor.go (11 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-01T20:39:16.113Z
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2252
File: router-tests/telemetry/telemetry_test.go:9684-9693
Timestamp: 2025-10-01T20:39:16.113Z
Learning: Repo preference: In router-tests/telemetry/telemetry_test.go, keep strict > 0 assertions for request.operation.*Time (parsingTime, normalizationTime, validationTime, planningTime) in telemetry-related tests; do not relax to >= 0 unless CI flakiness is observed.

Applied to files:

  • router-tests/normalization_cache_test.go
🧬 Code graph analysis (2)
router-tests/normalization_cache_test.go (2)
router-tests/testenv/testenv.go (2)
  • TestResponse (1924-1928)
  • GraphQLRequest (1909-1917)
router/core/graphql_handler.go (3)
  • NormalizationCacheHeader (39-39)
  • VariablesNormalizationCacheHeader (40-40)
  • VariablesRemappingCacheHeader (41-41)
router/core/operation_processor.go (1)
router-tests/testenv/testenv.go (1)
  • GraphQLRequest (1909-1917)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: integration_test (./events)
  • GitHub Check: image_scan
  • GitHub Check: image_scan (nonroot)
  • GitHub Check: integration_test (./telemetry)
  • GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)
  • GitHub Check: build_push_image (nonroot)
  • GitHub Check: build_test
  • GitHub Check: build_push_image
  • GitHub Check: Analyze (javascript-typescript)
  • GitHub Check: Analyze (go)
🔇 Additional comments (7)
router-tests/normalization_cache_test.go (2)

14-38: LGTM - Clean test infrastructure.

The cacheHit struct and assertCacheHeaders helper provide a clean abstraction for validating cache behavior. The error messages in the assertions are descriptive and will aid debugging if tests fail.


40-157: Excellent test coverage for cache interactions.

The test suite comprehensively validates normalization, variables normalization, and remapping cache behavior across multiple scenarios. The comment on lines 150-152 is particularly helpful in explaining why specific caches hit or miss during list coercion.

The expected cache patterns align correctly with the cache key generation logic:

  • Normalization cache keys include query structure
  • Variables normalization keys include both normalized representation and variable values
  • Remapping keys include only normalized representation
router/core/operation_processor.go (5)

767-794: Well-designed cache entry structures with clear documentation.

The cache entry types appropriately capture the state needed for each normalization phase:

  • VariablesNormalizationCacheEntry includes the reparse flag optimization to avoid unnecessary document reparsing
  • RemapVariablesCacheEntry captures the internal ID used for downstream caches
  • Documentation clearly explains each field's purpose

880-887: Cache key generation correctly reflects dependencies.

The cache key design appropriately captures dependencies:

  • normalizeVariablesCacheKey includes both variables and normalized representation, since variable normalization depends on actual values
  • remapVariablesCacheKey includes only normalized representation, since remapping depends only on query structure

This design maximizes cache hits while maintaining correctness.

Also applies to: 991-996


890-989: Solid cache integration with smart reparse optimization.

The NormalizeVariables cache implementation is well-designed:

  1. Cache key includes both variables and normalized representation - correctly captures dependencies
  2. Reparse optimization (lines 952-965) avoids expensive document reparsing when only variable values changed but structure remained unchanged
  3. Error handling is consistent with existing patterns
  4. Return signature now includes cached flag and mapping for upload path handling

The logic correctly handles both cache hit and miss paths while maintaining all necessary state.


999-1071: RemapVariables cache integration follows consistent pattern.

The RemapVariables cache implementation correctly:

  1. Uses normalized representation as cache key - remapping depends only on query structure, not variable values
  2. Always reparses after cache hit (line 1008) - necessary since downstream processing needs the parsed document
  3. Calculates InternalID - used as the cache key for validation, complexity, and planner caches downstream
  4. Returns cached flag for observability (used in headers and telemetry)

The logic is consistent with the NormalizeVariables pattern and correctly maintains all required state.


813-817: Correct cleanup of skip/include variables after cache hit.

This code ensures consistency between cache hit and miss paths:

  • On cache miss: normalization removes skip/include variables from doc.Input.Variables, which is copied to request variables (line 838)
  • On cache hit: the normalized representation is loaded without skip/include, but the original request variables still have them

The manual deletion ensures both paths produce identical results.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
router-tests/telemetry/telemetry_test.go (1)

3758-3758: Attribute count bumps on “Operation - Normalize” spans may be brittle

Bumping require.Len(t, sn[2].Attributes(), …) to account for the two new normalization/remapping cache‑hit attributes is correct now, but these tests are sensitive to any future attribute additions. Consider asserting the presence (and values) of the new cache‑related attributes explicitly and relaxing the length check (e.g. >= baseline) to reduce churn when span metadata evolves.

Also applies to: 4480-4480, 4935-4935, 7097-7097, 9554-9554

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between faacc11 and 9e0a7ac.

📒 Files selected for processing (1)
  • router-tests/telemetry/telemetry_test.go (31 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-10-01T20:39:16.113Z
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2252
File: router-tests/telemetry/telemetry_test.go:9684-9693
Timestamp: 2025-10-01T20:39:16.113Z
Learning: Repo preference: In router-tests/telemetry/telemetry_test.go, keep strict > 0 assertions for request.operation.*Time (parsingTime, normalizationTime, validationTime, planningTime) in telemetry-related tests; do not relax to >= 0 unless CI flakiness is observed.

Applied to files:

  • router-tests/telemetry/telemetry_test.go
📚 Learning: 2025-11-12T18:38:46.619Z
Learnt from: StarpTech
Repo: wundergraph/cosmo PR: 2327
File: graphqlmetrics/core/metrics_service.go:435-442
Timestamp: 2025-11-12T18:38:46.619Z
Learning: In graphqlmetrics/core/metrics_service.go, the opGuardCache TTL (30 days) is intentionally shorter than the gql_metrics_operations database retention period (90 days). This design reduces memory usage while relying on database constraints to prevent duplicate inserts after cache expiration.

Applied to files:

  • router-tests/telemetry/telemetry_test.go
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: build_push_image
  • GitHub Check: image_scan (nonroot)
  • GitHub Check: build_push_image (nonroot)
  • GitHub Check: image_scan
  • GitHub Check: integration_test (./events)
  • GitHub Check: integration_test (./telemetry)
  • GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)
  • GitHub Check: build_test
  • GitHub Check: Analyze (go)
  • GitHub Check: Analyze (javascript-typescript)
🔇 Additional comments (6)
router-tests/telemetry/telemetry_test.go (6)

581-612: New cache metrics for variables_normalization/remap_variables match existing patterns

The added hit/key/cost/max datapoints for cache_type = variables_normalization and remap_variables are consistent with the existing cache families (plan/query_normalization/validation) and correctly use the same base attributes and capacities for the default, no–feature-flag scenario.

Also applies to: 704-749, 839-868, 923-936


1081-1112: Persisted/non‑persisted cache telemetry extended coherently to new cache types

For the mixed persisted/non‑persisted case, the hit/miss, keys added, and cost totals for variables_normalization and remap_variables align with the request sequence and mirror how the tests treat the existing cache types. The expectations look internally consistent.

Also applies to: 1204-1249, 1339-1368, 1423-1436


1545-1576: Prometheus+OTLP cache telemetry keeps new cache types in sync

In the Prometheus-enabled scenario, the additional variables_normalization/remap_variables datapoints are wired exactly like the OTLP-only case (same attributes, same counts), so both exporters see a consistent cache story.

Also applies to: 1668-1713, 1803-1832, 1887-1900


2011-2042: Small validation cache eviction metrics cover new cache types without skewing semantics

The “small validation cache” test now reports request/key/cost/max metrics for variables_normalization and remap_variables with added/evicted/updated values that stay in step with the other caches and don’t disturb the validation‑cache eviction expectations.

Also applies to: 2134-2179, 2269-2298, 2353-2366


2476-2515: Feature‑flag cache telemetry correctly duplicates new cache types across main/FF dimensions

For the feature‑flag scenario, variables_normalization and remap_variables are added everywhere the other cache types appear (requests/keys/cost/max), with parallel counts for mainAttributes and featureFlagAttributes. That keeps the attribute surface symmetric across feature‑flagged vs non‑flagged traffic.

Also applies to: 2491-2505, 2507-2521, 2523-2537, 2603-2634, 2728-2775, 2874-2921, 3015-3046, 3112-3143, 3201-3214, 3244-3257


11704-11816: Exclusion filtering updated to cover new cache_type labels

In the custom‑exporter exclusion test, including variables_normalization and remap_variables in the expected router.graphql.cache.requests.stats datapoints ensures label‑filtering behavior is validated for all cache families, not just the original ones. Values are intentionally ignored, so the added datapoints are safe.

@ysmolski ysmolski requested a review from Noroth November 20, 2025 13:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants