Skip to content

Conversation

@graphite-app
Copy link
Contributor

@graphite-app graphite-app bot commented Jan 7, 2026

This draft PR was created by the Graphite merge queue.
Trunk will be fast forwarded to the HEAD of this PR when CI passes, and the original PRs will be closed.

The following PRs are included in this draft PR:

@railway-app
Copy link

railway-app bot commented Jan 7, 2026

This PR was not deployed automatically as @graphite-app[bot] does not have access to the Railway project.

In order to get automatic PR deploys, please add @graphite-app[bot] to your workspace on Railway.

@railway-app
Copy link

railway-app bot commented Jan 7, 2026

This PR was not deployed automatically as @graphite-app[bot] does not have access to the Railway project.

In order to get automatic PR deploys, please add @graphite-app[bot] to your workspace on Railway.

@graphite-app graphite-app bot closed this Jan 7, 2026
@graphite-app graphite-app bot deleted the gtmq_spec_750493_1767819465462-ffbd3aae-d98d-49f3-9dcf-750ca34e3845 branch January 7, 2026 20:58
@claude
Copy link

claude bot commented Jan 7, 2026

PR Review - Graphite Merge Queue (PRs #3762, #3764, #3774)

Overview

This is a merge queue PR combining three separate PRs. The changes include:

  • Grafana dashboard updates (guard, gasoline, operation dashboards)
  • Metrics improvements and bug fixes
  • API schema updates for runner configuration
  • Serverless connection tracking enhancements

Issues Found

🔴 Critical: Duplicate Metric Name

Location: engine/packages/pegboard/src/metrics.rs:26-31

There's a copy-paste error where two different metrics share the same name:

pub static ref RUNNER_VERSION_UPGRADE_DRAIN_TOTAL: IntCounterVec = register_int_counter_vec_with_registry!(
    "pegboard_runner_version_upgrade_drain_total",
    "Count of runners drained due to version upgrade.",
    &["namespace_id", "runner_name"],
    *REGISTRY
).unwrap();

pub static ref SERVERLESS_OUTBOUND_REQ_TOTAL: IntCounterVec = register_int_counter_vec_with_registry!(
    "pegboard_runner_version_upgrade_drain_total",  // ❌ Wrong name!
    "Count of serverless outbound requests made.",
    &["namespace_id", "runner_name"],
    *REGISTRY
).unwrap();

Impact: The second metric SERVERLESS_OUTBOUND_REQ_TOTAL will fail to register at runtime because the name pegboard_runner_version_upgrade_drain_total is already taken. This will cause a panic on startup due to the .unwrap().

Fix: Change line 27 to:

"pegboard_serverless_outbound_req_total",

Best Practices & Observations

✅ Positive Changes

  1. Improved Prometheus Query (gasoline.json): Fixed timestamp calculation by dividing by 1000 to convert milliseconds to seconds for Prometheus time() function compatibility
  2. Better Dashboard Organization: Added row groupings (TCP, Proxy Requests, Cache) in guard dashboard for improved readability
  3. New Error Rate Metric: Added error tracking in guard dashboard to monitor proxy request failures
  4. Metrics Usage: Proper use of metrics in conn.rs:401-403 for tracking serverless outbound requests

⚠️ Minor Observations

  1. Operation Dashboard Label Change: Changed from error_type to error label - ensure this matches the actual metric label in the codebase
  2. Grafana Dashboard IDs: Dashboard IDs changed (e.g., guard.json id from 115 to 5) - this is typically auto-generated and shouldn't cause issues, but worth noting for fresh imports

Code Quality

Positive aspects:

  • Clean structured logging patterns maintained throughout
  • Proper error handling with context in serverless connection workflow
  • Good use of type safety with custom error types
  • Follows repository conventions (imports at top, structured logging)

No issues found with:

  • Error handling patterns
  • Security practices
  • Performance considerations
  • Test coverage (changes are primarily monitoring/config updates)

Recommendations

  1. CRITICAL: Fix the duplicate metric name before merging
  2. Verify the error_typeerror label change in operation dashboard matches actual metric labels
  3. Consider adding a test to prevent duplicate metric names in the future
  4. After merging, verify dashboards import correctly with the new IDs

Summary

This PR has one critical bug that will cause runtime failures. The duplicate metric name must be fixed before merging. Otherwise, the changes look good and follow the codebase conventions well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants