Skip to content

Conversation

@endigma
Copy link
Member

@endigma endigma commented Nov 14, 2025

Summary by CodeRabbit

  • New Features

    • Added FlightRecorder mode to capture detailed request traces when latency exceeds a configurable threshold.
    • New command-line entrypoint to run the FlightRecorder mode.
  • Configuration

    • New settings: output directory, latency threshold (ms), and single vs multiple trace captures.
  • Documentation

    • Usage guide with examples, run/build instructions, configuration details, and environment setup notes.

✏️ Tip: You can customize this high-level summary in your review settings.

Checklist

  • I have discussed my proposed changes in an issue and have received approval to proceed.
  • I have followed the coding standards of the project.
  • Tests or benchmarks have been added or updated.
  • Documentation has been updated on https://github.com/wundergraph/cosmo-docs.
  • I have read the Contributors Guide.

@coderabbitai
Copy link

coderabbitai bot commented Nov 14, 2025

Walkthrough

Adds a new FlightRecorder module that records request traces to timestamped files when request latency exceeds a configured threshold, a dedicated Go executable entrypoint that imports the module for side effects, and accompanying README documentation.

Changes

Cohort / File(s) Summary
Documentation
router/cmd/flightrecorder/README.md
Adds README describing module usage, configuration fields (outputPath, requestLatencyRecordThreshold, recordMultiple), minimal example main, run/build commands, environment setup, and references to integration tests.
Entrypoint
router/cmd/flightrecorder/main.go
New Go executable entrypoint that empty-imports the flightrecorder module for side effects and delegates to routercmd.Main().
Module Implementation
router/cmd/flightrecorder/module/module.go
Adds FlightRecorder type and init() registration. Implements Provision, Cleanup, RouterOnRequest, and RecordTrace. Validates config, computes duration threshold, prepares OutputPath, configures and starts a trace.FlightRecorder, records traces to timestamped files when latency exceeds RequestLatencyRecordThreshold, and optionally stops after a single record when RecordMultiple is false. Adds public fields OutputPath, RecordMultiple, RequestLatencyRecordThreshold, interface guards, and module metadata.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • Review conversion/validation of RequestLatencyRecordThreshold to time.Duration.
  • Verify file/directory creation, permissions, and error handling for writing timestamped trace files under OutputPath.
  • Inspect lifecycle, concurrency, and shutdown handling of the trace.FlightRecorder, especially when RecordMultiple == false and when stopping the recorder from request-time logic.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and accurately summarizes the main change: adding a new custom module that implements flight recorder functionality.
✨ Finishing touches
  • 📝 Generate docstrings

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link

github-actions bot commented Nov 14, 2025

Router image scan passed

✅ No security vulnerabilities found in image:

ghcr.io/wundergraph/cosmo/router:sha-a857fa78251847fc944424651f2b5a2aafe78560

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (4)
router/cmd/flightrecorder/README.md (1)

5-17: Replace hard tabs with spaces in code example.

The code example uses hard tabs on lines 9, 10, 11, and 15, which should be replaced with spaces to align with standard Go formatting conventions.

Apply this diff:

 package main
 
 import (
-	routercmd "github.com/wundergraph/cosmo/router/cmd"
-	// Import your modules here
-	_ "github.com/wundergraph/cosmo/router/cmd/flightrecorder/module"
+    routercmd "github.com/wundergraph/cosmo/router/cmd"
+    // Import your modules here
+    _ "github.com/wundergraph/cosmo/router/cmd/flightrecorder/module"
 )
 
 func main() {
-	routercmd.Main()
+    routercmd.Main()
 }
router/cmd/flightrecorder/module/module.go (3)

29-29: Remove template comment.

This comment appears to be leftover from scaffolding and should be removed.

Apply this diff:

     requestLatencyRecordThresholdDuration time.Duration
 
-    // Add a new property here
     fl *trace.FlightRecorder

100-100: Fix error message format.

The error message uses a format string with %w but should rely on zap.Error for structured logging.

Apply this diff:

-        m.Logger.Error("failed to create trace file: %w", zap.Error(err))
+        m.Logger.Error("failed to create trace file", zap.Error(err))

92-113: Add file cleanup mechanism and sanitize operation names.

Two operational concerns:

  1. No file cleanup/rotation: Trace files accumulate indefinitely in the output directory, which could eventually exhaust disk space. Consider implementing a retention policy (e.g., delete files older than N days, or keep only the last N files).

  2. Unsanitized operation names in filenames: Operation names are used directly in filenames (line 94), but may contain characters that are invalid for filenames (/, \, :, *, ?, ", <, >, |). This could cause os.Create to fail.

Example sanitization approach:

import (
    "regexp"
    // ... other imports
)

var invalidFilenameChars = regexp.MustCompile(`[<>:"/\\|?*]`)

func sanitizeOperationName(name string) string {
    // Replace invalid characters with underscores
    sanitized := invalidFilenameChars.ReplaceAllString(name, "_")
    // Limit length to avoid filesystem limits
    if len(sanitized) > 100 {
        sanitized = sanitized[:100]
    }
    return sanitized
}

func (m *FlightRecorder) RecordTrace(operationName string) {
    sanitized := sanitizeOperationName(operationName)
    filename := fmt.Sprintf("trace-%s-%s.out", sanitized, time.Now().Format("2006-01-02-15-04-05"))
    // ... rest of the method
}

For file cleanup, consider adding a cleanup routine in the Provision method or as a background goroutine.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f94a099 and 12c2f7d.

📒 Files selected for processing (3)
  • router/cmd/flightrecorder/README.md (1 hunks)
  • router/cmd/flightrecorder/main.go (1 hunks)
  • router/cmd/flightrecorder/module/module.go (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
router/cmd/flightrecorder/main.go (1)
router/cmd/main.go (1)
  • Main (42-277)
router/cmd/flightrecorder/module/module.go (2)
router/core/modules.go (7)
  • RegisterModule (56-73)
  • ModuleContext (160-164)
  • Module (52-54)
  • ModuleInfo (44-50)
  • RouterOnRequestHandler (113-115)
  • Provisioner (149-152)
  • Cleaner (154-157)
router/core/context.go (1)
  • RequestContext (61-142)
🪛 markdownlint-cli2 (0.18.1)
router/cmd/flightrecorder/README.md

9-9: Hard tabs
Column: 1

(MD010, no-hard-tabs)


10-10: Hard tabs
Column: 1

(MD010, no-hard-tabs)


11-11: Hard tabs
Column: 1

(MD010, no-hard-tabs)


15-15: Hard tabs
Column: 1

(MD010, no-hard-tabs)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
  • GitHub Check: build-router
  • GitHub Check: build_push_image
  • GitHub Check: build_push_image (nonroot)
  • GitHub Check: image_scan
  • GitHub Check: image_scan (nonroot)
  • GitHub Check: build_test
  • GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)
  • GitHub Check: integration_test (./events)
  • GitHub Check: integration_test (./telemetry)
  • GitHub Check: Analyze (javascript-typescript)
  • GitHub Check: Analyze (go)
🔇 Additional comments (1)
router/cmd/flightrecorder/main.go (1)

1-10: LGTM!

The entrypoint correctly delegates to the router's main function and imports the module for side-effect registration. This is the standard pattern for module initialization.

@endigma endigma force-pushed the jesse/eng-8466-implement-flight-recorder-telemetry branch from 67237dd to e0770d1 Compare November 20, 2025 14:04
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
router/cmd/flightrecorder/README.md (2)

21-21: Clarify which configuration file contains the modules section.

Line 21 references "the main router configuration file," but doesn't specify the filename (e.g., config.yaml, router.yaml). Provide the explicit filename to reduce ambiguity for users.


45-45: Provide more context for .env.example setup.

The note mentions copying .env.example but doesn't explain what environment variables are needed or what the router should connect to. Either expand this section with required env vars or reference the main router documentation where this is covered.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 12c2f7d and e0770d1.

📒 Files selected for processing (3)
  • router/cmd/flightrecorder/README.md (1 hunks)
  • router/cmd/flightrecorder/main.go (1 hunks)
  • router/cmd/flightrecorder/module/module.go (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • router/cmd/flightrecorder/main.go
  • router/cmd/flightrecorder/module/module.go
🧰 Additional context used
🪛 markdownlint-cli2 (0.18.1)
router/cmd/flightrecorder/README.md

9-9: Hard tabs
Column: 1

(MD010, no-hard-tabs)


10-10: Hard tabs
Column: 1

(MD010, no-hard-tabs)


11-11: Hard tabs
Column: 1

(MD010, no-hard-tabs)


15-15: Hard tabs
Column: 1

(MD010, no-hard-tabs)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: build-router
  • GitHub Check: integration_test (./telemetry)
  • GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)
  • GitHub Check: integration_test (./events)
  • GitHub Check: build_test
  • GitHub Check: build_push_image (nonroot)
  • GitHub Check: build_push_image
  • GitHub Check: Analyze (go)

@endigma endigma force-pushed the jesse/eng-8466-implement-flight-recorder-telemetry branch from e0770d1 to de3f5eb Compare November 20, 2025 15:24
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (2)
router/cmd/flightrecorder/README.md (2)

9-16: Replace hard tabs with spaces in the Go code block.

The fenced Go example still uses hard tabs on the import and routercmd.Main() lines, which violates the repo’s markdownlint rule (MD010) and can render inconsistently. Please replace those tabs with the project’s standard indentation (e.g., 4 spaces).


3-3: Clarify wording: reference the runtime/trace FlightRecorder API, not a standalone module.

The text calls this the "new flightrecorder module from Go 1.25". To match Go’s terminology, this should point to the FlightRecorder API in the runtime/trace package (the blog link is fine), e.g., “…uses the new FlightRecorder API in the runtime/trace package (Go 1.25)…”.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e0770d1 and de3f5eb.

📒 Files selected for processing (3)
  • router/cmd/flightrecorder/README.md (1 hunks)
  • router/cmd/flightrecorder/main.go (1 hunks)
  • router/cmd/flightrecorder/module/module.go (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • router/cmd/flightrecorder/main.go
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-10-01T20:39:16.113Z
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2252
File: router-tests/telemetry/telemetry_test.go:9684-9693
Timestamp: 2025-10-01T20:39:16.113Z
Learning: Repo preference: In router-tests/telemetry/telemetry_test.go, keep strict > 0 assertions for request.operation.*Time (parsingTime, normalizationTime, validationTime, planningTime) in telemetry-related tests; do not relax to >= 0 unless CI flakiness is observed.

Applied to files:

  • router/cmd/flightrecorder/module/module.go
  • router/cmd/flightrecorder/README.md
📚 Learning: 2025-08-20T22:13:25.222Z
Learnt from: StarpTech
Repo: wundergraph/cosmo PR: 2157
File: router-tests/go.mod:16-16
Timestamp: 2025-08-20T22:13:25.222Z
Learning: github.com/mark3labs/mcp-go v0.38.0 has regressions and should not be used in the wundergraph/cosmo project. v0.36.0 is the stable version that should be used across router-tests and other modules.

Applied to files:

  • router/cmd/flightrecorder/README.md
🧬 Code graph analysis (1)
router/cmd/flightrecorder/module/module.go (2)
router/core/modules.go (7)
  • RegisterModule (56-73)
  • ModuleContext (160-164)
  • Module (52-54)
  • ModuleInfo (44-50)
  • RouterOnRequestHandler (113-115)
  • Provisioner (149-152)
  • Cleaner (154-157)
router/core/context.go (1)
  • RequestContext (61-142)
🪛 markdownlint-cli2 (0.18.1)
router/cmd/flightrecorder/README.md

9-9: Hard tabs
Column: 1

(MD010, no-hard-tabs)


10-10: Hard tabs
Column: 1

(MD010, no-hard-tabs)


11-11: Hard tabs
Column: 1

(MD010, no-hard-tabs)


15-15: Hard tabs
Column: 1

(MD010, no-hard-tabs)

🔇 Additional comments (1)
router/cmd/flightrecorder/module/module.go (1)

83-97: Router hook logic looks sound.

The request hook wraps next.ServeHTTP, measures duration, and only records when Enabled() is true and the latency threshold is exceeded. This is a clean, low-overhead integration point and aligns with the module’s purpose.

Copy link
Contributor

@StarpTech StarpTech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@endigma endigma force-pushed the jesse/eng-8466-implement-flight-recorder-telemetry branch from a12a5b4 to c438034 Compare November 24, 2025 13:12
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
router/cmd/flightrecorder/README.md (1)

5-17: Replace hard tabs with spaces in the code example.

The code block contains hard tab characters that violate Markdown linting standards and may render inconsistently across viewers.

Apply this diff to replace tabs with spaces:

 ```go
 package main
 
 import (
-	routercmd "github.com/wundergraph/cosmo/router/cmd"
-	// Import your modules here
-	_ "github.com/wundergraph/cosmo/router/cmd/flightrecorder/module"
+    routercmd "github.com/wundergraph/cosmo/router/cmd"
+    // Import your modules here
+    _ "github.com/wundergraph/cosmo/router/cmd/flightrecorder/module"
 )
 
 func main() {
-	routercmd.Main()
+    routercmd.Main()
 }

</blockquote></details>

</blockquote></details>

<details>
<summary>🧹 Nitpick comments (2)</summary><blockquote>

<details>
<summary>router/cmd/flightrecorder/module/module.go (2)</summary><blockquote>

`54-68`: **Clarify MaxBytes calculation logic.**

The comment states "10MB/s of MinAge" but the implementation only scales MaxBytes when the threshold exceeds 512ms. For thresholds ≤512ms, a 2-second MinAge gets only the 10MB minimum, which is 5MB/s—not the stated 10MB/s.

Consider either:
1. Updating the comment to reflect the actual behavior (10MB minimum with scaling for larger thresholds)
2. Adjusting the formula to consistently provide ~10MB per second of MinAge

Current behavior:
- Threshold 100ms → MinAge 200ms → MaxBytes 10MB (50MB/s, way above target)
- Threshold 500ms → MinAge 1000ms → MaxBytes 10MB (10MB/s, matches target)
- Threshold 1000ms → MinAge 2000ms → MaxBytes ~19.5MB (~9.75MB/s, close to target)



Example for consistent 10MB/s:

```diff
-	// 10MB minimum
-	var maxBytes uint64 = 10 * 1024 * 1024
-
-	// We actually want ~10MB/s of MinAge
-	// 1000ms = 1 second, 1000 is close enough to 1024
-	// sub in the uint milliseconds count for one of the factors
-	// if it would result in a value greater than default maxBytes
-	if m.RequestLatencyRecordThreshold*2 > 1024 {
-		maxBytes = (m.RequestLatencyRecordThreshold * 2) * 1024 * 10
-	}
+	// Target: 10MB per second of MinAge (doubled threshold)
+	// MinAge is in milliseconds: (threshold * 2) ms
+	// 10MB/s = 10MB per 1000ms
+	// maxBytes = (threshold * 2) * 10MB / 1000
+	maxBytes := (m.RequestLatencyRecordThreshold * 2 * 10 * 1024 * 1024) / 1000
+	// Enforce 10MB minimum
+	if maxBytes < 10*1024*1024 {
+		maxBytes = 10 * 1024 * 1024
+	}

99-119: Verify RecordMultiple behavior with concurrent requests.

When RecordMultiple is false, line 117 stops the recorder after writing a trace. However, if multiple slow requests trigger concurrently before the first Stop() completes, multiple RecordTrace calls may execute, potentially recording multiple traces despite the intent to record only once.

Combined with the concurrency issue flagged in RouterOnRequest, consider:

  1. Using the mutex suggested in the previous comment to serialize all RecordTrace calls
  2. Adding a flag to track whether a trace has been recorded when RecordMultiple is false

Example:

 type FlightRecorder struct {
 	OutputPath                    string `mapstructure:"outputPath"`
 	RecordMultiple                bool   `mapstructure:"recordMultiple"`
 	RequestLatencyRecordThreshold uint64 `mapstructure:"requestLatencyRecordThreshold"`
 
 	requestLatencyRecordThresholdDuration time.Duration
 
 	fl *trace.FlightRecorder
+	recordMu sync.Mutex
+	recorded bool
 
 	Logger *zap.Logger
 }

Then in RecordTrace:

 func (m *FlightRecorder) RecordTrace(operationName string) {
+	if !m.RecordMultiple && m.recorded {
+		return
+	}
+
 	// Generate timestamped filename
 	filename := fmt.Sprintf("trace-%s-%s.out", operationName, time.Now().Format("2006-01-02-15-04-05"))
 
 	// Create the file
 	file, err := os.Create(filepath.Join(m.OutputPath, filename))
 	if err != nil {
 		m.Logger.Error("failed to create trace file", zap.Error(err))
 		return
 	}
 	defer file.Close()
 
 	_, err = m.fl.WriteTo(file)
 	if err != nil {
 		m.Logger.Error("Failed to record request", zap.Error(err))
+		return
 	}
 
+	m.recorded = true
+
 	if !m.RecordMultiple {
 		m.fl.Stop()
 	}
 }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4ddd0c3 and c438034.

📒 Files selected for processing (3)
  • router/cmd/flightrecorder/README.md (1 hunks)
  • router/cmd/flightrecorder/main.go (1 hunks)
  • router/cmd/flightrecorder/module/module.go (1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-08-20T22:13:25.222Z
Learnt from: StarpTech
Repo: wundergraph/cosmo PR: 2157
File: router-tests/go.mod:16-16
Timestamp: 2025-08-20T22:13:25.222Z
Learning: github.com/mark3labs/mcp-go v0.38.0 has regressions and should not be used in the wundergraph/cosmo project. v0.36.0 is the stable version that should be used across router-tests and other modules.

Applied to files:

  • router/cmd/flightrecorder/README.md
📚 Learning: 2025-10-01T20:39:16.113Z
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2252
File: router-tests/telemetry/telemetry_test.go:9684-9693
Timestamp: 2025-10-01T20:39:16.113Z
Learning: Repo preference: In router-tests/telemetry/telemetry_test.go, keep strict > 0 assertions for request.operation.*Time (parsingTime, normalizationTime, validationTime, planningTime) in telemetry-related tests; do not relax to >= 0 unless CI flakiness is observed.

Applied to files:

  • router/cmd/flightrecorder/README.md
  • router/cmd/flightrecorder/module/module.go
🧬 Code graph analysis (2)
router/cmd/flightrecorder/main.go (1)
router/cmd/main.go (1)
  • Main (42-277)
router/cmd/flightrecorder/module/module.go (1)
router/core/modules.go (7)
  • RegisterModule (56-73)
  • ModuleContext (160-164)
  • Module (52-54)
  • ModuleInfo (44-50)
  • RouterOnRequestHandler (113-115)
  • Provisioner (149-152)
  • Cleaner (154-157)
🪛 markdownlint-cli2 (0.18.1)
router/cmd/flightrecorder/README.md

9-9: Hard tabs
Column: 1

(MD010, no-hard-tabs)


10-10: Hard tabs
Column: 1

(MD010, no-hard-tabs)


11-11: Hard tabs
Column: 1

(MD010, no-hard-tabs)


15-15: Hard tabs
Column: 1

(MD010, no-hard-tabs)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: build-router
  • GitHub Check: build_test
  • GitHub Check: integration_test (./telemetry)
  • GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)
  • GitHub Check: Analyze (go)
🔇 Additional comments (4)
router/cmd/flightrecorder/main.go (1)

1-10: LGTM! Clean entrypoint implementation.

The implementation correctly uses a side-effect import to register the FlightRecorder module and delegates to the existing router main function. This is a clean pattern for extending the router with custom modules.

router/cmd/flightrecorder/module/module.go (3)

1-31: LGTM! Clean module structure and registration.

The module structure follows the core module system conventions correctly with proper registration in init() and appropriate field definitions.


77-81: LGTM! Proper cleanup implementation.

The Cleanup method correctly stops the flight recorder when the module shuts down.


121-136: LGTM! Proper module metadata and interface guards.

The Module() method correctly returns the module metadata, and the interface guards ensure compile-time verification of interface implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants