Skip to content

actor: add drop counter and first-drop log to BackpressureMailbox#10761

Open
gijswijs wants to merge 3 commits intolightningnetwork:masterfrom
gijswijs:backpressure-mailbox-drop-counter
Open

actor: add drop counter and first-drop log to BackpressureMailbox#10761
gijswijs wants to merge 3 commits intolightningnetwork:masterfrom
gijswijs:backpressure-mailbox-drop-counter

Conversation

@gijswijs
Copy link
Copy Markdown
Collaborator

Summary

Adds two observability primitives to actor.BackpressureMailbox so operators can see when an actor's mailbox starts shedding load:

  • Dropped() returns the total count of predicate rejections since the mailbox was created.
  • FirstDropClaim() is a one-shot CAS flag that succeeds exactly once, and only after at least one real drop has occurred. Intended for call sites that want to emit a one-shot log or metric on the first rejection.

Also introduces BackpressureMailboxCfg, an extensible config struct for optional settings. When Name is set, the mailbox itself emits a single info-level log on the first predicate drop via an internal firstLog flag that is independent of FirstDropClaim. This lets the internal auto-log and an external caller-driven log coexist without racing for the same flag.

The onion message actor's mailbox is wired up as the first consumer (Name: "onion-message"), complementing the per-peer and global onion-message rate-limiter first-drop logs introduced in #10713.

Commits

  1. actor: — the mailbox API additions and tests (actor module only).
  2. docs: — new release-notes-0.21.1.md with a Code Health entry under Technical and Architectural Updates.
  3. build+onionmessage: — temporary replace github.com/lightningnetwork/lnd/actor => ./actor plus the onionmessage call-site update. Squashed into one commit because the replace directive points the root module at the 4-param NewBackpressureMailbox signature in local ./actor, which would break the 3-param call site in onionmessage/actor.go if landed on its own. The replace directive follows the existing pattern for queue and sqldb, and is intended to be dropped once the actor module is tagged with a release containing BackpressureMailboxCfg.

Test plan

  • go test -race ./... passes in ./actor/
  • go build ./... passes at the repo root
  • go test -count=1 ./onionmessage/ passes at the repo root
  • Every intermediate commit builds cleanly (verified with git checkout <sha> && go build ./... at both actor module and root)
  • CI green on PR

Add two observability primitives to BackpressureMailbox so operators
can see when load shedding kicks in:

  - Dropped() returns the total count of predicate rejections since
    the mailbox was created.
  - FirstDropClaim() is a one-shot CAS flag that succeeds exactly
    once, and only after at least one real drop has occurred. It is
    intended for call sites that want to emit a one-shot log or
    metric on the first rejection.

Also introduce BackpressureMailboxCfg, an extensible config struct
for optional settings. When Name is set, the mailbox itself emits a
single info-level log on the first predicate drop via an internal
firstLog flag that is independent of FirstDropClaim. This lets the
internal auto-log and an external caller-driven log coexist without
racing for the same flag.
Introduce the release notes file for 0.21.1 using the standard
template, and add a Code Health entry under Technical and
Architectural Updates describing the new BackpressureMailbox
drop counter and first-drop log signal.
Add a replace directive so the root module picks up the local ./actor
sources that contain the new BackpressureMailboxCfg type, and
simultaneously adopt it at the onion message actor's mailbox
construction site by passing Name="onion-message". The two changes
are squashed into a single commit because the replace directive
points the root module at the 4-param NewBackpressureMailbox
signature in local ./actor, which would break the existing 3-param
call site in onionmessage/actor.go if landed on its own.

This commit is intended to be dropped once the actor module is
tagged with a release that contains BackpressureMailboxCfg. At that
point, update the actor dependency version in go.mod and remove the
replace directive; the onionmessage call site can stay as is.
@github-actions github-actions Bot added the severity-medium Focused review required label Apr 17, 2026
@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the actor.BackpressureMailbox by introducing new observability primitives. It provides mechanisms for operators to gain deeper insights into an actor's mailbox behavior under backpressure, specifically by tracking the total number of dropped messages and signaling the very first instance of a message being dropped. These additions aim to improve system monitoring and debugging capabilities when mailboxes start shedding load.

Highlights

  • Enhanced Observability: Introduced Dropped() method to BackpressureMailbox to return the total count of messages rejected by the drop predicate, providing a clear metric for load shedding.
  • First-Drop Signaling: Added FirstDropClaim() method to BackpressureMailbox, offering a one-shot CAS flag that succeeds only after the first message drop, enabling single-event logging or metrics for initial backpressure events.
  • Configurable Mailbox Behavior: Implemented BackpressureMailboxCfg for optional configuration, allowing a Name to be set for automatic info-level logging on the first message drop, independent of FirstDropClaim().
  • Onion Message Actor Integration: Integrated the new observability features into the onion message actor's mailbox, setting its name to "onion-message" for automatic first-drop logging, complementing existing rate-limiter logs.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@github-actions
Copy link
Copy Markdown

PR Severity: MEDIUM

Automated classification | 5 files | 184 lines changed

Medium (4 files):

  • actor/backpressure_mailbox.go - Uncategorized Go package; actor/mailbox utility code
  • onionmessage/actor.go - Uncategorized Go package; onion message actor integration
  • go.mod - Dependency update
  • go.sum - Dependency checksum update

Low (1 file):

  • docs/release-notes/release-notes-0.21.1.md - Release notes documentation

Analysis

This PR modifies the actor package (backpressure_mailbox.go) and onionmessage/actor.go, neither of which fall into CRITICAL or HIGH severity categories. The changes are utility/actor infrastructure code, onion messaging, dependency updates, and release notes.

Severity bump checks:

  • Files changed (excluding tests/generated): 5 - does not exceed 20
  • Lines changed (excluding tests/generated): 184 - does not exceed 500
  • No multiple distinct critical packages touched

Verdict: MEDIUM - focused review by a Go-familiar engineer is appropriate.


To override, add a severity-override-{critical,high,medium,low} label.
<!-- pr-severity-bot -->

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enhances the BackpressureMailbox with observability features, including an atomic counter for dropped messages and a one-shot logging mechanism for the first drop. It introduces a BackpressureMailboxCfg struct to allow naming mailboxes and adds Dropped() and FirstDropClaim() methods for monitoring. The onion message actor is updated to use these new capabilities. Review feedback suggests transitioning to structured logging using the log/slog package and log.InfoS to comply with the repository's style guide.

@@ -9,19 +9,59 @@ import (
"github.com/lightningnetwork/lnd/queue"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To support structured logging as required by the repository style guide, the log/slog package should be imported.

	"log/slog"

	"github.com/lightningnetwork/lnd/queue"

Comment on lines +108 to +110
log.Infof("Mailbox(%s): first message "+
"dropped (queue_depth=%d)",
mb.name, queueLen)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The Repository Style Guide (lines 237-253) requires using structured logging for static messages. Instead of log.Infof with a formatted string, use log.InfoS with key-value pairs and slog attributes. Note that structured log lines are an exception to the 80-character rule.

Suggested change
log.Infof("Mailbox(%s): first message "+
"dropped (queue_depth=%d)",
mb.name, queueLen)
log.InfoS(mb.actorCtx, "Mailbox first message dropped",
slog.String("name", mb.name),
slog.Int("queue_depth", queueLen))
References
  1. Static messages should use key-value pairs instead of formatted strings for the msg parameter. (link)
  2. Structured log lines are an exception to the 80-character rule. Use one line per key-value pair for multiple attributes. (link)

Copy link
Copy Markdown

@haanhvu haanhvu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! I have a couple of suggestions around the tests, mainly to cover some concurrent scenarios that might be closer to production.

Comment on lines +445 to +448
require.True(t, mbox.FirstDropClaim(),
"first call after a drop should claim the flag")
require.False(t, mbox.FirstDropClaim(),
"second call must return false")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works for the sequential case, but in production it’s likely that multiple goroutines hit FirstDropClaim() around the same time. Maybe worth adding a concurrent test, i.e., multiple goroutines call it at the same time and check that only one returns true?

One more case that might be also worth testing is when drops and FirstDropClaim() happen at the same time. In production, senders could be triggering drops while other goroutines are calling FirstDropClaim(), so it might be useful to simulate that interleaving and still verify that exactly one claim succeeds.

@lightninglabs-deploy
Copy link
Copy Markdown
Collaborator

@gijswijs, remember to re-request review from reviewers when ready

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

severity-medium Focused review required

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants