Skip to content

Conversation

@MasterPtato
Copy link
Contributor

@MasterPtato MasterPtato commented Jan 17, 2026

No description provided.

@railway-app
Copy link

railway-app bot commented Jan 17, 2026

🚅 Deployed to the rivet-pr-3952 environment in rivet-frontend

Service Status Web Updated (UTC)
frontend-inspector 😴 Sleeping (View Logs) Web Jan 20, 2026 at 7:00 pm
frontend-cloud 😴 Sleeping (View Logs) Web Jan 20, 2026 at 6:59 pm
website 😴 Sleeping (View Logs) Web Jan 20, 2026 at 6:57 pm

@railway-app railway-app bot temporarily deployed to rivet-frontend / rivet-pr-3952 January 17, 2026 00:30 Destroyed
Copy link
Contributor Author


How to use the Graphite Merge Queue

Add the label merge-queue to this PR to add it to the merge queue.

You must have a Graphite account in order to use the merge queue. Sign up using this link.

An organization admin has enabled the Graphite Merge Queue in this repository.

Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.

This stack of pull requests is managed by Graphite. Learn more about stacking.

@MasterPtato MasterPtato changed the title fix: day 1 patch, add dashboards feat: add actor and kv metrics Jan 17, 2026
@MasterPtato MasterPtato force-pushed the 01-15-feat_add_actor_and_kv_metrics2 branch from 67c4a86 to 8b7ad2b Compare January 20, 2026 18:48
@claude
Copy link

claude bot commented Jan 20, 2026

PR Review: Add Actor and KV Metrics

This PR adds comprehensive metrics tracking for Rivet Actors and KV operations. The implementation is generally well-structured, but there are several issues that should be addressed.


Critical Issues

1. Duplicate Metric Increment in setup.rs ⚠️

File: engine/packages/pegboard/src/workflows/actor/setup.rs:114-127

The TotalActors metric is incremented twice in the same transaction:

// Update metrics
keys::ns::metric::inc(
    &tx,
    input.namespace_id,
    keys::ns::metric::Metric::TotalActors(input.name.clone()),
    1,
);

// Update metrics  // <- Duplicate comment and code
keys::ns::metric::inc(
    &tx,
    input.namespace_id,
    keys::ns::metric::Metric::TotalActors(input.name.clone()),
    1,
);

Impact: This will cause the total actors count to be inflated by 2x, making the metric inaccurate.

Fix: Remove one of the duplicate increments.


Major Issues

2. Potential Integer Overflow in Metric Calculations

Files:

  • engine/packages/pegboard/src/actor_kv/mod.rs:127-133, 239-246, 278-287, 350-357

The metric calculation uses .try_into().unwrap_or_default() which silently converts overflow errors to 0:

let total_size_chunked = (total_size as u64).div_ceil(util::metric::KV_BILLABLE_CHUNK)
    * util::metric::KV_BILLABLE_CHUNK;
keys::ns::metric::inc(
    &tx.with_subspace(keys::subspace()),
    recipient.namespace_id,
    keys::ns::metric::Metric::KvRead(recipient.name.clone()),
    total_size_chunked.try_into().unwrap_or_default(),  // <- Silent failure
);

Impact: If total_size_chunked exceeds i64::MAX, the metric will be recorded as 0 instead of the actual value or producing an error.

Recommendation:

  • Use .try_into()? to propagate the error, or
  • Add explicit logging when conversion fails, or
  • Use saturating_as() if you want to cap at i64::MAX

3. Inconsistent Error Handling in EntryBuilder

File: engine/packages/pegboard/src/actor_kv/entry.rs:31-36

The append_chunk method silently ignores chunks that arrive out of order:

pub fn append_chunk(&mut self, idx: usize, chunk: &[u8]) {
    if idx >= self.next_idx {  // Only appends if idx is sequential
        self.value.extend(chunk);
        self.next_idx = idx + 1;
    }
}

Impact: If chunks arrive out of order (which shouldn't happen but could indicate a bug), they'll be silently dropped, leading to corrupted data without any error indication.

Recommendation: Either:

  • Log a warning when a chunk is out of order, or
  • Return a Result to propagate the error

Code Quality Issues

4. Missing Documentation on Metric Aggregation

File: engine/packages/pegboard/src/keys/ns/metric.rs:278-284

The inc function uses atomic operations but lacks documentation about:

  • Whether metrics are gauges or counters
  • How metric values should be interpreted
  • The expected lifetime/retention of metrics

Recommendation: Add comprehensive documentation explaining:

/// Atomically increments a namespace metric.
///
/// # Metrics Model
/// - Metrics are stored as atomic counters using LE byte encoding
/// - Values accumulate over time and are never decremented automatically
/// - For gauge metrics (like ActorAwake), callers must manage increments/decrements
///
/// # Parameters
/// - `by`: The amount to increment (can be negative for decrements)

5. Unclear Metric Semantics

File: engine/packages/pegboard/src/keys/ns/metric.rs:5-27

Some metrics have unclear semantics:

  • ActorAwake(String) - Comment says "Seconds" but it's unclear if this is cumulative awake time or current count
  • KvStorageUsed(String) - Is this updated on every operation or calculated periodically?

Recommendation: Improve documentation with examples:

pub enum Metric {
    /// Cumulative seconds actors have been awake (counter, incremented when actors sleep)
    ActorAwake(String),
    /// Current number of actors (gauge, incremented on create, decremented on destroy)
    TotalActors(String),
    /// Total bytes stored in KV (gauge, updated on put/delete operations)
    KvStorageUsed(String),
    // ...
}

6. Test Coverage Missing Metric Verification

File: engine/packages/pegboard/tests/kv_operations.rs

The test file comprehensively tests KV operations but doesn't verify that metrics are being recorded correctly. This is a significant gap given that the PR's primary purpose is adding metrics.

Recommendation: Add test cases that:

  • Verify metrics are incremented on KV operations
  • Check that read/write byte counts are accurate
  • Validate that metrics use proper billable chunk rounding

Minor Issues

7. Magic Numbers

File: engine/packages/util/src/metric.rs:1

pub const KV_BILLABLE_CHUNK: u64 = crate::size::kibibytes(4);

This constant is used throughout but lacks documentation explaining:

  • Why 4 KiB was chosen
  • Whether this aligns with billing/pricing models
  • Any performance implications

8. Potential Performance Impact

Files: Multiple KV operation files

Every KV operation now performs additional metric updates within the same transaction. While this ensures consistency, it adds overhead to every operation.

Observations:

  • Good: Metrics are part of the same transaction (ACID guarantees)
  • Concern: No batching mechanism for high-throughput scenarios
  • Concern: Atomic operations on hot keys could cause contention

Recommendation: Consider:

  • Documenting the performance trade-offs
  • Adding metrics to track metric update latency
  • Future optimization: batch metric updates asynchronously

Positive Aspects ✅

  1. Well-structured key design - The metric key hierarchy is clean and extensible
  2. Comprehensive KV test coverage - Tests thoroughly exercise edge cases
  3. Proper use of atomic operations - Metrics use MutationType::Add correctly
  4. Billable chunk rounding - Correctly implements rounding up for billing
  5. Actor lifecycle tracking - Metrics properly track actor state transitions

Security Considerations

No major security concerns identified. The metrics system:

  • Uses proper namespace isolation
  • Doesn't expose sensitive data
  • Follows the existing access control patterns

Summary

Recommendation: Request changes before merging.

Must Fix:

  1. Remove duplicate TotalActors increment
  2. Handle integer overflow in metric calculations properly

Should Fix:
3. Improve EntryBuilder error handling
4. Add metric verification to tests
5. Document metric semantics and aggregation behavior

Nice to Have:
6. Document billable chunk size rationale
7. Consider future performance optimizations

The core implementation is solid, but the duplicate metric increment is a critical bug that will corrupt metric data. Once that's fixed and overflow handling is improved, this will be a valuable addition to the codebase.


Generated by Claude Code

@pkg-pr-new
Copy link

pkg-pr-new bot commented Jan 20, 2026

More templates

@rivetkit/cloudflare-workers

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/cloudflare-workers@3952

@rivetkit/db

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/db@3952

@rivetkit/framework-base

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/framework-base@3952

@rivetkit/next-js

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/next-js@3952

@rivetkit/react

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/react@3952

rivetkit

pnpm add https://pkg.pr.new/rivet-dev/rivet/rivetkit@3952

@rivetkit/sql-loader

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/sql-loader@3952

@rivetkit/virtual-websocket

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/virtual-websocket@3952

@rivetkit/engine-runner

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/engine-runner@3952

@rivetkit/engine-runner-protocol

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/engine-runner-protocol@3952

commit: b47e44c

@MasterPtato MasterPtato force-pushed the 01-15-feat_add_actor_and_kv_metrics2 branch from 8b7ad2b to b47e44c Compare January 20, 2026 20:53
@railway-app railway-app bot temporarily deployed to rivet-frontend / rivet-pr-3952 January 20, 2026 20:53 Destroyed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants