-
Notifications
You must be signed in to change notification settings - Fork 959
bkpr first use after large migration issues: the key fixes #8618
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
rustyrussell
wants to merge
3
commits into
ElementsProject:master
Choose a base branch
from
rustyrussell:guilt/bkpr-post-migration
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
bkpr first use after large migration issues: the key fixes #8618
rustyrussell
wants to merge
3
commits into
ElementsProject:master
from
rustyrussell:guilt/bkpr-post-migration
+42
−42
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…own on large numbers of requests. Note that we create a destructor on the command to reset request->cmd pointer if the cmd is freed (so we know not to call the callback). But attaching hundreds of thousands of them is slow: it's a single-linked list, which is iterated in several places. But that's redundant: the request is now allocated off the cmd, so freeing the command will free the request anyway. Hacking in something to print progress to a file, here's the number of requests processed every 10 seconds before and after: Before: $ while sleep 10; do wc -l /tmp/bkpr-progress; done 181529 /tmp/bkpr-progress 195994 /tmp/bkpr-progress 207083 /tmp/bkpr-progress 226336 /tmp/bkpr-progress 234319 /tmp/bkpr-progress 241514 /tmp/bkpr-progress 247421 /tmp/bkpr-progress 255292 /tmp/bkpr-progress 261367 /tmp/bkpr-progress 269085 /tmp/bkpr-progress 276953 /tmp/bkpr-progress 282233 /tmp/bkpr-progress 286193 /tmp/bkpr-progress 290930 /tmp/bkpr-progress 295276 /tmp/bkpr-progress 301086 /tmp/bkpr-progress After: 169505 /tmp/bkpr-progress 196010 /tmp/bkpr-progress 219370 /tmp/bkpr-progress 235671 /tmp/bkpr-progress 244242 /tmp/bkpr-progress 255362 /tmp/bkpr-progress 265636 /tmp/bkpr-progress 276966 /tmp/bkpr-progress 284451 /tmp/bkpr-progress 288836 /tmp/bkpr-progress 296578 /tmp/bkpr-progress 304571 /tmp/bkpr-progress Signed-off-by: Rusty Russell <[email protected]>
c4b321a
to
f04241b
Compare
This significantly speeds up the query which bookkeeper often does: "SELECT created_index" " FROM channelmoves" " WHERE payment_hash = X'%s'" " AND credit_msat = %"PRIu64 " AND created_index <= %"PRIu64, On large databases this scan is expensive, and a payment_hash index cuts it down a great deal. It does take longer to load the channelmoves in the first place though (about 3x). Before: $ while sleep 10; do wc -l /tmp/bkpr-progress; done 169505 /tmp/bkpr-progress 196010 /tmp/bkpr-progress 219370 /tmp/bkpr-progress 235671 /tmp/bkpr-progress 244242 /tmp/bkpr-progress 255362 /tmp/bkpr-progress 265636 /tmp/bkpr-progress 276966 /tmp/bkpr-progress 284451 /tmp/bkpr-progress 288836 /tmp/bkpr-progress 296578 /tmp/bkpr-progress 304571 /tmp/bkpr-progress After: $ while sleep 10; do wc -l /tmp/bkpr-progress; done 161421 /tmp/bkpr-progress 238273 /tmp/bkpr-progress 281185 /tmp/bkpr-progress 305787 /tmp/bkpr-progress Signed-off-by: Rusty Russell <[email protected]> Changelog-Changed: plugins: the sql plugin now keeps an index on `channelmoves` by `payment_hash`.
If we read all of them, we might get 1.6M at once (after initial migration). Then we submit a few hundred thousand simultaneous requests to lightningd, and it gets upset, queueing them all on the xpay command hook and running out of memory. Signed-off-by: Rusty Russell <[email protected]> Changelog-Fixed: plugins: bookkeeper first invocation after migration from prior to 25.09 with very large databases will not crash.
f04241b
to
bd8c907
Compare
sangbida
approved these changes
Oct 19, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me :) I did have a couple of questions:
- Do folks who have already migrated have to do anything extra (another migration?) to get the new index, does a migration just happen when a database change is made?
- Are the changes to the trace array not required after all?
- Is it worth writing a test for the read listchannelmoves 1000 entries at a time? Just wondering if there's a chance we could hit funny edge cases here in a case of a database rollback/deletion? I remember you mentioned that the cln database is not just appened only, not sure if that's entirely relevant here or not though.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes: #8614
When you have 1.6 million channelmoves (migrated from accounts.db), the first call to any bookkeeper function will cause it to digest them all. This proved very slow, and eventually crashed the node.
These three fix this.