keychain: cache derived priv key in BtcWalletKeyRing.ECDH to avoid per-call DB txn#10779
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request optimizes the performance of onion-message processing by caching the derived private key within the PubKeyECDH structure. Previously, every ECDH operation triggered a read-write database transaction, causing significant overhead due to disk I/O. By caching the key at construction time, these operations are now performed in-memory, resulting in substantial throughput improvements in production environments. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
🔴 PR Severity: CRITICAL
🔴 Critical (1 file)
🟢 Low / Excluded (1 file)
AnalysisThis PR modifies The second file ( Severity bump check: Only 1 non-test file and 59 lines changed — no bump conditions triggered. Expert review is required given the cryptographic sensitivity of the keychain package. To override, add a |
There was a problem hiding this comment.
Code Review
This pull request optimizes ECDH operations by caching derived private keys in PubKeyECDH when a local secret key ring is used, which avoids redundant database transactions. It also refactors shared ECDH logic into a new helper function and adds a comprehensive benchmark for the onion message pipeline. I have no feedback to provide.
dacf4a8 to
6579171
Compare
6579171 to
25b406c
Compare
Measures the end-to-end deliver path through OnionPeerActor.Receive using the same wiring as server.go: a real BtcWalletKeyRing backing keychain.PubKeyECDH as the sphinx router's onion key. This gives a prod-shaped per-goroutine throughput ceiling and exposes the cost (or savings) of changes to the keychain ECDH path on a realistic load, not a synthetic in-memory shortcut.
25b406c to
969f717
Compare
Done. Thanks! I've also moved the cache to |
Each call to BtcWalletKeyRing.ECDH went through DerivePrivKey, which opens a read-write wallet DB transaction and forces a bbolt meta-page write plus fdatasync per call. Memoize the derived private key on the keyring, keyed by the descriptor's compressed public key, so repeated ECDH operations against the same key (every brontide handshake against the node identity key, every onion message decrypt, every watchtower session ECDH) stay entirely in memory after the first call. The cache is a neutrino LRU bounded at 1000 entries. ECDH callers in practice use a small set of keys (the node identity key on the onion hot path plus per-channel revocation roots, base-encryption keys, and signrpc-supplied descriptors), so 1000 comfortably covers the working set while keeping memory bounded against any caller that drives an unusually wide range of descriptors. A wrapper type carries the private key and reports Size 1 so the LRU bounds entry count rather than bytes. The cache lives on BtcWalletKeyRing rather than on PubKeyECDH so the private material stays behind the type that already owns it and every ECDHRing.ECDH caller benefits, not just those that go through the PubKeyECDH wrapper. Keying by the serialized compressed public key keeps lookups correct regardless of whether DerivePrivKey takes the path-based or the PubKey-scan branch: the same priv.PubKey() always maps to the same cache slot, with no cross-key collision risk. Descriptors without a PubKey (uncommon on the ECDH hot path) bypass the cache and forward to DerivePrivKey unchanged. Remote-signer deployments use RPCKeyRing instead and are not affected.
969f717 to
d72e89c
Compare

BtcWalletKeyRing.ECDHis on the hot path of every onion-message hop:the sphinx router holds the local node's
keychain.PubKeyECDHas itsonionKeyand calls into it fromProcessOnionPacket,DecryptBlindedHopData, andNextEphemeral, three calls per messagein the deliver path. Each call dispatches through
ECDHRing.ECDHtoBtcWalletKeyRing.ECDH, which callsDerivePrivKey. BecausenodeKeyDesc.PubKeyis set inserver.go, the in-memorywaddrmgrfast path is skipped and the call falls through to a
walletdb.Updateread-write transaction. Each transaction forces a bbolt meta-page write
and an
fdatasyncper ECDH operation.The derivation is deterministic for a given key descriptor, so this PR
memoizes the derived private key on the keyring itself, keyed by the
descriptor's compressed-serialized public key. After the first ECDH
against a given key every subsequent call (every brontide handshake
against the node identity key, every onion message decrypt, every
watchtower session ECDH) stays entirely in memory.
The cache lives on
BtcWalletKeyRingrather than onPubKeyECDHsothe private material stays behind the type that already owns it: the
wrapper stays a pure
ECDHRingadapter, and any direct caller ofECDHRing.ECDHbenefits without going through the wrapper. Keying bythe serialized compressed public key is unambiguous regardless of
whether
DerivePrivKeytakes the path-based or thePubKey-scanbranch, the same
priv.PubKey()always maps to the same cache slot,with no cross-key collision risk. Descriptors without a
PubKey(uncommon on the ECDH hot path) bypass the cache and forward to
DerivePrivKeyunchanged. Remote-signer deployments useRPCKeyRing,whose
ECDHcontinues to forward over RPC; the cache lives on thewatch-only
BtcWalletKeyRingit wraps and is never hit on that path.The cache is a neutrino LRU bounded at 1000 entries with a wrapper
type that reports Size 1 so the cap is on entry count rather than
bytes. The bound is defense-in-depth: cache keys are only ever
populated from descriptors the local node itself supplies (the node
identity key, per-channel revocation roots, base-encryption keys,
locally-allocated watchtower session indices, signrpc-supplied
descriptors), and are never derived from remote input, so there is
no remote-driven growth vector. 1000 entries comfortably covers the
working set across every ECDH caller in the codebase while keeping
memory bounded against any future caller that drives an unusually
wide range of descriptors. On capacity the LRU evicts the least
recently used entry, so the onion-message hot-path keys (which are
touched on every hop) stay resident.
The benchmark commit (
onionmessage: add OnionPeerActor.Receive pipeline benchmark) lands first so the before/after can be reproducedby checking out either commit.
Measured impact
Microbenchmark (
BenchmarkOnionMessagePipeline, deliver path, realBtcWalletKeyRing+keychain.PubKeyECDH, single goroutine)The ~1,032 alloc/op reduction is the structural fingerprint of three
bbolt R/W transactions per message disappearing, confirming the three
keychain-routed ECDH call sites collapse to the in-memory fast path.
Raw
go testoutputBefore:
After:
End-to-end (regtest forwarding scenario,
monitor-drops.sh alice, sustained ~28 s window, 0.0% drops in both runs)Raw
monitor-drops.sh aliceoutputBefore:
After:
The end-to-end gain exceeds the microbenchmark gain because the bench's
b.TempDir()sits on tmpfs, wherefdatasyncis effectively a no-op.Prod runs on durable storage, so every bbolt read-write transaction
pays a real
fdatasyncsyscall and with the bbolt single-writer lockserializing the three per-message ECDH txns against concurrent wallet
writers (chain sync, channel state), the syscall overhead dominates.
Caching the derived priv key removes those transactions entirely,
which is why prod sees a much larger speedup than the in-memory
microbench.
Syscall profile during the onion-message blast
Before:
After:
Thanks to @morehouse for reporting the performance issues with LND onion message routing.