Skip to content

keychain: cache derived priv key in BtcWalletKeyRing.ECDH to avoid per-call DB txn#10779

Open
erickcestari wants to merge 2 commits intolightningnetwork:masterfrom
erickcestari:improve-onion-processing-performance
Open

keychain: cache derived priv key in BtcWalletKeyRing.ECDH to avoid per-call DB txn#10779
erickcestari wants to merge 2 commits intolightningnetwork:masterfrom
erickcestari:improve-onion-processing-performance

Conversation

@erickcestari
Copy link
Copy Markdown
Collaborator

@erickcestari erickcestari commented Apr 29, 2026

BtcWalletKeyRing.ECDH is on the hot path of every onion-message hop:
the sphinx router holds the local node's keychain.PubKeyECDH as its
onionKey and calls into it from ProcessOnionPacket,
DecryptBlindedHopData, and NextEphemeral, three calls per message
in the deliver path. Each call dispatches through ECDHRing.ECDH to
BtcWalletKeyRing.ECDH, which calls DerivePrivKey. Because
nodeKeyDesc.PubKey is set in server.go, the in-memory waddrmgr
fast path is skipped and the call falls through to a walletdb.Update
read-write transaction. Each transaction forces a bbolt meta-page write
and an fdatasync per ECDH operation.

The derivation is deterministic for a given key descriptor, so this PR
memoizes the derived private key on the keyring itself, keyed by the
descriptor's compressed-serialized public key. After the first ECDH
against a given key every subsequent call (every brontide handshake
against the node identity key, every onion message decrypt, every
watchtower session ECDH) stays entirely in memory.

The cache lives on BtcWalletKeyRing rather than on PubKeyECDH so
the private material stays behind the type that already owns it: the
wrapper stays a pure ECDHRing adapter, and any direct caller of
ECDHRing.ECDH benefits without going through the wrapper. Keying by
the serialized compressed public key is unambiguous regardless of
whether DerivePrivKey takes the path-based or the PubKey-scan
branch, the same priv.PubKey() always maps to the same cache slot,
with no cross-key collision risk. Descriptors without a PubKey
(uncommon on the ECDH hot path) bypass the cache and forward to
DerivePrivKey unchanged. Remote-signer deployments use RPCKeyRing,
whose ECDH continues to forward over RPC; the cache lives on the
watch-only BtcWalletKeyRing it wraps and is never hit on that path.

The cache is a neutrino LRU bounded at 1000 entries with a wrapper
type that reports Size 1 so the cap is on entry count rather than
bytes. The bound is defense-in-depth: cache keys are only ever
populated from descriptors the local node itself supplies (the node
identity key, per-channel revocation roots, base-encryption keys,
locally-allocated watchtower session indices, signrpc-supplied
descriptors), and are never derived from remote input, so there is
no remote-driven growth vector. 1000 entries comfortably covers the
working set across every ECDH caller in the codebase while keeping
memory bounded against any future caller that drives an unusually
wide range of descriptors. On capacity the LRU evicts the least
recently used entry, so the onion-message hot-path keys (which are
touched on every hop) stay resident.

The benchmark commit (onionmessage: add OnionPeerActor.Receive pipeline benchmark) lands first so the before/after can be reproduced
by checking out either commit.

Measured impact

Microbenchmark (BenchmarkOnionMessagePipeline, deliver path, real BtcWalletKeyRing + keychain.PubKeyECDH, single goroutine)

Metric Before After Delta
ns/op 3,008,424 732,200 4.11× faster
B/op 110,574 27,750 4.0× smaller
allocs/op 1,177 145 8.1× fewer
msg/s/goroutine ~332 ~1,366 +311%

The ~1,032 alloc/op reduction is the structural fingerprint of three
bbolt R/W transactions per message disappearing, confirming the three
keychain-routed ECDH call sites collapse to the in-memory fast path.

Raw go test output

Before:

goos: linux
goarch: amd64
pkg: github.com/lightningnetwork/lnd/onionmessage
cpu: 12th Gen Intel(R) Core(TM) i5-12500H
BenchmarkOnionMessagePipeline        799           3008424 ns/op          110574 B/op       1177 allocs/op
PASS
ok      github.com/lightningnetwork/lnd/onionmessage    2.732s

1 / 3008424 ns ≈ 332 msg/s

After:

goos: linux
goarch: amd64
pkg: github.com/lightningnetwork/lnd/onionmessage
cpu: 12th Gen Intel(R) Core(TM) i5-12500H
BenchmarkOnionMessagePipeline       3255            732200 ns/op           27750 B/op        145 allocs/op
PASS
ok      github.com/lightningnetwork/lnd/onionmessage    2.488s

1 / 732200 ns ≈ 1,366 msg/s

End-to-end (regtest forwarding scenario, monitor-drops.sh alice, sustained ~28 s window, 0.0% drops in both runs)

Metric Before After Delta
recv/s (avg) ~69 ~870 ~12.6× faster
Sustained range 63–73 recv/s 830–930 recv/s
Raw monitor-drops.sh alice output

Before:

elapsed   time                     recv/s      fwd/s fwd_fail/s   drop_b/s  drop_nb/s   drop%         Δrecv        Δfwd   Δfwd_fail     Δdrop_b    Δdrop_nb
1.0    s  2026-04-29 15:13:59       69.38      69.38       0.00       0.00       0.00    0.0%             70           70            0            0            0
2.0    s  2026-04-29 15:14:00       62.96      62.96       0.00       0.00       0.00    0.0%            134          134            0            0            0
3.0    s  2026-04-29 15:14:01       73.23      73.23       0.00       0.00       0.00    0.0%            208          208            0            0            0
4.0    s  2026-04-29 15:14:02       71.13      71.13       0.00       0.00       0.00    0.0%            280          280            0            0            0
5.1    s  2026-04-29 15:14:03       69.98      69.98       0.00       0.00       0.00    0.0%            351          351            0            0            0
6.1    s  2026-04-29 15:14:04       69.74      69.74       0.00       0.00       0.00    0.0%            422          422            0            0            0
7.1    s  2026-04-29 15:14:05       69.06      69.06       0.00       0.00       0.00    0.0%            492          492            0            0            0
8.1    s  2026-04-29 15:14:06       68.96      68.96       0.00       0.00       0.00    0.0%            562          562            0            0            0
9.1    s  2026-04-29 15:14:08       70.14      70.14       0.00       0.00       0.00    0.0%            634          634            0            0            0
10.1   s  2026-04-29 15:14:09       73.03      73.03       0.00       0.00       0.00    0.0%            708          708            0            0            0
11.2   s  2026-04-29 15:14:10       69.21      69.21       0.00       0.00       0.00    0.0%            778          778            0            0            0
12.2   s  2026-04-29 15:14:11       65.07      65.07       0.00       0.00       0.00    0.0%            844          844            0            0            0
13.2   s  2026-04-29 15:14:12       71.36      71.36       0.00       0.00       0.00    0.0%            916          916            0            0            0
14.2   s  2026-04-29 15:14:13       67.07      67.07       0.00       0.00       0.00    0.0%            984          984            0            0            0
15.2   s  2026-04-29 15:14:14       69.21      69.21       0.00       0.00       0.00    0.0%           1054         1054            0            0            0
16.2   s  2026-04-29 15:14:15       67.06      67.06       0.00       0.00       0.00    0.0%           1122         1122            0            0            0
17.2   s  2026-04-29 15:14:16       67.91      67.91       0.00       0.00       0.00    0.0%           1191         1191            0            0            0
18.3   s  2026-04-29 15:14:17       69.07      69.07       0.00       0.00       0.00    0.0%           1261         1261            0            0            0
19.3   s  2026-04-29 15:14:18       66.01      66.01       0.00       0.00       0.00    0.0%           1328         1328            0            0            0
20.3   s  2026-04-29 15:14:19       63.13      63.13       0.00       0.00       0.00    0.0%           1392         1392            0            0            0
21.3   s  2026-04-29 15:14:20       68.81      68.81       0.00       0.00       0.00    0.0%           1462         1462            0            0            0
22.3   s  2026-04-29 15:14:21       69.26      69.26       0.00       0.00       0.00    0.0%           1532         1532            0            0            0
23.3   s  2026-04-29 15:14:22       68.12      68.12       0.00       0.00       0.00    0.0%           1601         1601            0            0            0
24.3   s  2026-04-29 15:14:23       71.15      71.15       0.00       0.00       0.00    0.0%           1673         1673            0            0            0
25.3   s  2026-04-29 15:14:24       67.01      67.01       0.00       0.00       0.00    0.0%           1741         1741            0            0            0
26.4   s  2026-04-29 15:14:25       70.26      70.26       0.00       0.00       0.00    0.0%           1813         1813            0            0            0
27.4   s  2026-04-29 15:14:26       67.10      67.10       0.00       0.00       0.00    0.0%           1881         1881            0            0            0

After:

./lightning-scenarios/monitor-drops.sh alice
elapsed   time                     recv/s      fwd/s fwd_fail/s   drop_b/s  drop_nb/s   drop%         Δrecv        Δfwd   Δfwd_fail     Δdrop_b    Δdrop_nb
1.0    s  2026-04-29 14:11:30      865.12     865.12       0.00       0.00       0.00    0.0%            873          873            0            0            0
2.0    s  2026-04-29 14:11:31      930.86     930.86       0.00       0.00       0.00    0.0%           1813         1813            0            0            0
3.0    s  2026-04-29 14:11:32      875.45     875.45       0.00       0.00       0.00    0.0%           2698         2698            0            0            0
4.0    s  2026-04-29 14:11:33      872.01     872.01       0.00       0.00       0.00    0.0%           3579         3579            0            0            0
5.1    s  2026-04-29 14:11:34      883.89     883.89       0.00       0.00       0.00    0.0%           4472         4472            0            0            0
6.1    s  2026-04-29 14:11:35      867.01     867.01       0.00       0.00       0.00    0.0%           5348         5348            0            0            0
7.1    s  2026-04-29 14:11:36      836.89     836.89       0.00       0.00       0.00    0.0%           6194         6194            0            0            0
8.1    s  2026-04-29 14:11:37      905.46     905.46       0.00       0.00       0.00    0.0%           7109         7109            0            0            0
9.1    s  2026-04-29 14:11:38      878.38     878.38       0.00       0.00       0.00    0.0%           7997         7997            0            0            0
10.1   s  2026-04-29 14:11:39      888.11     888.11       0.00       0.00       0.00    0.0%           8894         8894            0            0            0
11.1   s  2026-04-29 14:11:40      845.53     845.53       0.00       0.00       0.00    0.0%           9747         9747            0            0            0
12.1   s  2026-04-29 14:11:41      885.22     885.22       0.00       0.00       0.00    0.0%          10640        10640            0            0            0
13.1   s  2026-04-29 14:11:42      874.99     874.99       0.00       0.00       0.00    0.0%          11524        11524            0            0            0
14.1   s  2026-04-29 14:11:43      889.28     889.28       0.00       0.00       0.00    0.0%          12423        12423            0            0            0
15.2   s  2026-04-29 14:11:44      832.37     832.37       0.00       0.00       0.00    0.0%          13264        13264            0            0            0
16.2   s  2026-04-29 14:11:45      878.71     878.71       0.00       0.00       0.00    0.0%          14152        14152            0            0            0
17.2   s  2026-04-29 14:11:46      863.66     863.66       0.00       0.00       0.00    0.0%          15025        15025            0            0            0
18.2   s  2026-04-29 14:11:47      882.69     882.69       0.00       0.00       0.00    0.0%          15916        15916            0            0            0
19.2   s  2026-04-29 14:11:48      831.92     831.92       0.00       0.00       0.00    0.0%          16757        16757            0            0            0
20.2   s  2026-04-29 14:11:49      835.85     835.85       0.00       0.00       0.00    0.0%          17602        17602            0            0            0
21.2   s  2026-04-29 14:11:50      832.98     832.98       0.00       0.00       0.00    0.0%          18444        18444            0            0            0
22.2   s  2026-04-29 14:11:51      841.40     841.40       0.00       0.00       0.00    0.0%          19293        19293            0            0            0
23.2   s  2026-04-29 14:11:52      887.89     887.89       0.00       0.00       0.00    0.0%          20190        20190            0            0            0
24.2   s  2026-04-29 14:11:53      845.71     845.71       0.00       0.00       0.00    0.0%          21044        21044            0            0            0
25.3   s  2026-04-29 14:11:54      864.52     864.52       0.00       0.00       0.00    0.0%          21916        21916            0            0            0
26.3   s  2026-04-29 14:11:55      885.28     885.28       0.00       0.00       0.00    0.0%          22809        22809            0            0            0
27.3   s  2026-04-29 14:11:56      868.00     868.00       0.00       0.00       0.00    0.0%          23685        23685            0            0            0
28.3   s  2026-04-29 14:11:57      888.71     888.71       0.00       0.00       0.00    0.0%          24583        24583            0            0            0

The end-to-end gain exceeds the microbenchmark gain because the bench's
b.TempDir() sits on tmpfs, where fdatasync is effectively a no-op.
Prod runs on durable storage, so every bbolt read-write transaction
pays a real fdatasync syscall and with the bbolt single-writer lock
serializing the three per-message ECDH txns against concurrent wallet
writers (chain sync, channel state), the syscall overhead dominates.
Caching the derived priv key removes those transactions entirely,
which is why prod sees a much larger speedup than the in-memory
microbench.

Syscall profile during the onion-message blast

Before:

Screenshot From 2026-04-28 16-55-44

After:

Screenshot From 2026-04-29 13-57-19

Thanks to @morehouse for reporting the performance issues with LND onion message routing.

@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request optimizes the performance of onion-message processing by caching the derived private key within the PubKeyECDH structure. Previously, every ECDH operation triggered a read-write database transaction, causing significant overhead due to disk I/O. By caching the key at construction time, these operations are now performed in-memory, resulting in substantial throughput improvements in production environments.

Highlights

  • Performance Optimization: Implemented caching of the derived private key in PubKeyECDH to eliminate redundant read-write database transactions on the onion-message hot path.
  • Code Consolidation: Extracted ECDH math into a shared helper function, ecdhFromPriv, to unify the code path for both cached and non-cached ECDH operations.
  • Benchmarking: Added a new benchmark, BenchmarkOnionMessagePipeline, to measure the performance impact of the changes, demonstrating a significant reduction in allocations and latency.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@github-actions github-actions Bot added the severity-critical Requires expert review - security/consensus critical label Apr 29, 2026
@github-actions
Copy link
Copy Markdown

🔴 PR Severity: CRITICAL

Classified from changed files | 1 non-test file | 59 lines changed

🔴 Critical (1 file)
  • keychain/ecdh.go - Private key derivation and management (keychain/*)
🟢 Low / Excluded (1 file)
  • onionmessage/actor_bench_test.go - Test file, excluded from severity classification

Analysis

This PR modifies keychain/ecdh.go, which falls under the keychain/* package responsible for private key derivation and management — a CRITICAL security-sensitive area. Changes here affect ECDH operations which are foundational to cryptographic operations in the node (e.g., key exchange, onion encryption).

The second file (onionmessage/actor_bench_test.go) is a benchmark test file and is excluded from severity classification per policy.

Severity bump check: Only 1 non-test file and 59 lines changed — no bump conditions triggered.

Expert review is required given the cryptographic sensitivity of the keychain package.


To override, add a severity-override-{critical,high,medium,low} label.
<!-- pr-severity-bot -->

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request optimizes ECDH operations by caching derived private keys in PubKeyECDH when a local secret key ring is used, which avoids redundant database transactions. It also refactors shared ECDH logic into a new helper function and adds a comprehensive benchmark for the onion message pipeline. I have no feedback to provide.

@erickcestari erickcestari self-assigned this Apr 29, 2026
@erickcestari erickcestari force-pushed the improve-onion-processing-performance branch from dacf4a8 to 6579171 Compare April 29, 2026 19:59
@yyforyongyu yyforyongyu added the enhancement Improvements to existing features / behaviour label Apr 30, 2026
@yyforyongyu yyforyongyu added this to the v0.22.0 milestone Apr 30, 2026
@erickcestari erickcestari force-pushed the improve-onion-processing-performance branch from 6579171 to 25b406c Compare April 30, 2026 13:00
@saubyk saubyk added this to lnd v0.22 Apr 30, 2026
@github-project-automation github-project-automation Bot moved this to Backlog in lnd v0.22 Apr 30, 2026
@saubyk saubyk moved this from Backlog to In progress in lnd v0.22 Apr 30, 2026
Copy link
Copy Markdown
Contributor

@Abdulkbk Abdulkbk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tACK

Image

One nit: I think we can swap the commits so the benchmark comes first, making it easier to test before applying the changes in ecdh.

Comment thread keychain/ecdh.go
Measures the end-to-end deliver path through OnionPeerActor.Receive
using the same wiring as server.go: a real BtcWalletKeyRing backing
keychain.PubKeyECDH as the sphinx router's onion key. This gives a
prod-shaped per-goroutine throughput ceiling and exposes the cost
(or savings) of changes to the keychain ECDH path on a realistic
load, not a synthetic in-memory shortcut.
@erickcestari erickcestari changed the title keychain: cache derived priv key in PubKeyECDH to avoid per-call DB txn keychain: cache derived priv key in BtcWalletKeyRing.ECDH to avoid per-call DB txn May 7, 2026
@erickcestari erickcestari force-pushed the improve-onion-processing-performance branch from 25b406c to 969f717 Compare May 7, 2026 19:21
@erickcestari
Copy link
Copy Markdown
Collaborator Author

One nit: I think we can swap the commits so the benchmark comes first, making it easier to test before applying the changes in ecdh.

Done. Thanks! I've also moved the cache to BtcWalletKeyRing.ECDH. This should maintain the same behaviour while improving the code quality.

Each call to BtcWalletKeyRing.ECDH went through DerivePrivKey, which
opens a read-write wallet DB transaction and forces a bbolt meta-page
write plus fdatasync per call. Memoize the derived private key on the
keyring, keyed by the descriptor's compressed public key, so repeated
ECDH operations against the same key (every brontide handshake against
the node identity key, every onion message decrypt, every watchtower
session ECDH) stay entirely in memory after the first call.

The cache is a neutrino LRU bounded at 1000 entries. ECDH callers in
practice use a small set of keys (the node identity key on the onion
hot path plus per-channel revocation roots, base-encryption keys, and
signrpc-supplied descriptors), so 1000 comfortably covers the working
set while keeping memory bounded against any caller that drives an
unusually wide range of descriptors. A wrapper type carries the
private key and reports Size 1 so the LRU bounds entry count rather
than bytes.

The cache lives on BtcWalletKeyRing rather than on PubKeyECDH so the
private material stays behind the type that already owns it and every
ECDHRing.ECDH caller benefits, not just those that go through the
PubKeyECDH wrapper. Keying by the serialized compressed public key
keeps lookups correct regardless of whether DerivePrivKey takes the
path-based or the PubKey-scan branch: the same priv.PubKey() always
maps to the same cache slot, with no cross-key collision risk.
Descriptors without a PubKey (uncommon on the ECDH hot path) bypass
the cache and forward to DerivePrivKey unchanged. Remote-signer
deployments use RPCKeyRing instead and are not affected.
@erickcestari erickcestari force-pushed the improve-onion-processing-performance branch from 969f717 to d72e89c Compare May 8, 2026 12:48
@github-actions github-actions Bot added severity-critical Requires expert review - security/consensus critical and removed severity-critical Requires expert review - security/consensus critical labels May 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement Improvements to existing features / behaviour severity-critical Requires expert review - security/consensus critical

Projects

Status: In progress

Development

Successfully merging this pull request may close these issues.

5 participants