Skip to content

feat: Track background blob event processing#3278

Open
sadhansood wants to merge 2 commits intomainfrom
sadhan/oldest-unfinished-event
Open

feat: Track background blob event processing#3278
sadhansood wants to merge 2 commits intomainfrom
sadhan/oldest-unfinished-event

Conversation

@sadhansood
Copy link
Copy Markdown
Contributor

Description

At epoch boundaries, EpochChangeStart waits for background blob event processing to drain, but current metrics do not show how far those workers have progressed through event indexes. This metric gives a direct high water mark for worker-side processing so we can tell whether the boundary is still blocked on background event handling versus something later.

Test plan

cargo test -p walrus-service tracks_highest_processed_event_index


Release notes

Check each box that your changes affect. If none of the boxes relate to your changes, release notes aren't required.
For each box you select, include information after the relevant heading that describes the impact of your changes that
a user might notice and any actions they must take to implement updates. (Add release notes after the colon for each item)

  • Storage node:
  • Aggregator:
  • Publisher:
  • CLI:

Copilot AI review requested due to automatic review settings April 8, 2026 23:02
@sadhansood sadhansood changed the title Sadhan/oldest unfinished event feat: Track background blob event processing Apr 8, 2026
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a storage-node metric that exposes the high-water mark of background blob-event processing completion, helping diagnose epoch-boundary blocking when EpochChangeStart waits for background workers to drain.

Changes:

  • Introduce a new metric (event_highest_background_processed_index) to report the highest background-processed event index (or -1 if none).
  • Track/update this high-water mark from background event workers via PendingEventCounter.
  • Add a unit test validating monotonic tracking behavior for the new metric.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
crates/walrus-service/src/node/metrics.rs Defines the new metric and exposes helper setters/resetters (plus test-only getter).
crates/walrus-service/src/node/blob_event_processor/pending_events.rs Extends PendingEventCounter to track highest processed event index and adds a focused unit test.
crates/walrus-service/src/node/blob_event_processor.rs Wires the counter into background workers and resets the metric on processor initialization.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +41 to +43
let event_index = event_index
.try_into()
.expect("event index should fit in i64");
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

observe_processed_event converts event_index: u64 into i64 with expect(...), which can panic and take down the node if the event index ever exceeds i64::MAX (even if that’s unlikely). Metrics updates should be panic-free; consider either (a) switching the metric to a u64-backed gauge and representing the “unset” state separately, or (b) clamping/saturating the value at i64::MAX (and optionally logging a warn) instead of panicking.

Suggested change
let event_index = event_index
.try_into()
.expect("event index should fit in i64");
let event_index = match i64::try_from(event_index) {
Ok(event_index) => event_index,
Err(_) => {
tracing::warn!(
event_index,
max_supported_event_index = i64::MAX,
"event index exceeds i64 range; clamping highest processed event index metric"
);
i64::MAX
}
};

Copilot uses AI. Check for mistakes.
@sadhansood sadhansood force-pushed the sadhan/oldest-unfinished-event branch from dfac675 to f469f45 Compare April 9, 2026 00:10
#[derive(Debug, Clone)]
pub struct PendingEventCounter {
inner: Arc<AtomicU32>,
highest_processed_event_index: Arc<AtomicI64>,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unclear that this needs to be an I64, vs a U64, with a default value of 0. This might eliminate the need for the expect. It seems fine to be hand-wavy about the 0 case for metrics.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants