actor: add drop counter and first-drop log to BackpressureMailbox#10761
actor: add drop counter and first-drop log to BackpressureMailbox#10761gijswijs wants to merge 3 commits intolightningnetwork:masterfrom
Conversation
Add two observability primitives to BackpressureMailbox so operators
can see when load shedding kicks in:
- Dropped() returns the total count of predicate rejections since
the mailbox was created.
- FirstDropClaim() is a one-shot CAS flag that succeeds exactly
once, and only after at least one real drop has occurred. It is
intended for call sites that want to emit a one-shot log or
metric on the first rejection.
Also introduce BackpressureMailboxCfg, an extensible config struct
for optional settings. When Name is set, the mailbox itself emits a
single info-level log on the first predicate drop via an internal
firstLog flag that is independent of FirstDropClaim. This lets the
internal auto-log and an external caller-driven log coexist without
racing for the same flag.
Introduce the release notes file for 0.21.1 using the standard template, and add a Code Health entry under Technical and Architectural Updates describing the new BackpressureMailbox drop counter and first-drop log signal.
Add a replace directive so the root module picks up the local ./actor sources that contain the new BackpressureMailboxCfg type, and simultaneously adopt it at the onion message actor's mailbox construction site by passing Name="onion-message". The two changes are squashed into a single commit because the replace directive points the root module at the 4-param NewBackpressureMailbox signature in local ./actor, which would break the existing 3-param call site in onionmessage/actor.go if landed on its own. This commit is intended to be dropped once the actor module is tagged with a release that contains BackpressureMailboxCfg. At that point, update the actor dependency version in go.mod and remove the replace directive; the onionmessage call site can stay as is.
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
PR Severity: MEDIUM
Medium (4 files):
Low (1 file):
AnalysisThis PR modifies the Severity bump checks:
Verdict: MEDIUM - focused review by a Go-familiar engineer is appropriate. To override, add a |
There was a problem hiding this comment.
Code Review
This pull request enhances the BackpressureMailbox with observability features, including an atomic counter for dropped messages and a one-shot logging mechanism for the first drop. It introduces a BackpressureMailboxCfg struct to allow naming mailboxes and adds Dropped() and FirstDropClaim() methods for monitoring. The onion message actor is updated to use these new capabilities. Review feedback suggests transitioning to structured logging using the log/slog package and log.InfoS to comply with the repository's style guide.
| @@ -9,19 +9,59 @@ import ( | |||
| "github.com/lightningnetwork/lnd/queue" | |||
| log.Infof("Mailbox(%s): first message "+ | ||
| "dropped (queue_depth=%d)", | ||
| mb.name, queueLen) |
There was a problem hiding this comment.
The Repository Style Guide (lines 237-253) requires using structured logging for static messages. Instead of log.Infof with a formatted string, use log.InfoS with key-value pairs and slog attributes. Note that structured log lines are an exception to the 80-character rule.
| log.Infof("Mailbox(%s): first message "+ | |
| "dropped (queue_depth=%d)", | |
| mb.name, queueLen) | |
| log.InfoS(mb.actorCtx, "Mailbox first message dropped", | |
| slog.String("name", mb.name), | |
| slog.Int("queue_depth", queueLen)) |
haanhvu
left a comment
There was a problem hiding this comment.
Thanks for the PR! I have a couple of suggestions around the tests, mainly to cover some concurrent scenarios that might be closer to production.
| require.True(t, mbox.FirstDropClaim(), | ||
| "first call after a drop should claim the flag") | ||
| require.False(t, mbox.FirstDropClaim(), | ||
| "second call must return false") |
There was a problem hiding this comment.
This works for the sequential case, but in production it’s likely that multiple goroutines hit FirstDropClaim() around the same time. Maybe worth adding a concurrent test, i.e., multiple goroutines call it at the same time and check that only one returns true?
One more case that might be also worth testing is when drops and FirstDropClaim() happen at the same time. In production, senders could be triggering drops while other goroutines are calling FirstDropClaim(), so it might be useful to simulate that interleaving and still verify that exactly one claim succeeds.
|
@gijswijs, remember to re-request review from reviewers when ready |
Summary
Adds two observability primitives to
actor.BackpressureMailboxso operators can see when an actor's mailbox starts shedding load:Dropped()returns the total count of predicate rejections since the mailbox was created.FirstDropClaim()is a one-shot CAS flag that succeeds exactly once, and only after at least one real drop has occurred. Intended for call sites that want to emit a one-shot log or metric on the first rejection.Also introduces
BackpressureMailboxCfg, an extensible config struct for optional settings. WhenNameis set, the mailbox itself emits a single info-level log on the first predicate drop via an internalfirstLogflag that is independent ofFirstDropClaim. This lets the internal auto-log and an external caller-driven log coexist without racing for the same flag.The onion message actor's mailbox is wired up as the first consumer (
Name: "onion-message"), complementing the per-peer and global onion-message rate-limiter first-drop logs introduced in #10713.Commits
actor:— the mailbox API additions and tests (actor module only).docs:— newrelease-notes-0.21.1.mdwith a Code Health entry under Technical and Architectural Updates.build+onionmessage:— temporaryreplace github.com/lightningnetwork/lnd/actor => ./actorplus the onionmessage call-site update. Squashed into one commit because the replace directive points the root module at the 4-paramNewBackpressureMailboxsignature in local./actor, which would break the 3-param call site inonionmessage/actor.goif landed on its own. The replace directive follows the existing pattern forqueueandsqldb, and is intended to be dropped once the actor module is tagged with a release containingBackpressureMailboxCfg.Test plan
go test -race ./...passes in./actor/go build ./...passes at the repo rootgo test -count=1 ./onionmessage/passes at the repo rootgit checkout <sha> && go build ./...at both actor module and root)