-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Improve SendPaymentV2 performance #10406
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Improve SendPaymentV2 performance #10406
Conversation
Replace the (bandwidth, bool) return signature with (bandwidth, error) to provide more context about why bandwidth is unavailable. This allows callers to distinguish between: - Channel not found in local channels map (ErrLocalChannelNotFound) - Channel found but unusable (offline, HTLC limits, etc.) The new error-based approach improves error handling throughout the routing package: - pathfind: Use capacity fallback only for channels which are not found in the local graph map, which can happen when channels were opened and activated during the payment process started for example. - unified_edges: Skip unusable local channels instead of using max bandwidth - Tests updated to use checkErrIs/checkErrContains for precise validation This fixes TODO comments about unclear bandwidth hint failures and improves channel selection by avoiding attempts to route through channels that cannot carry payments.
ce40b53 to
745bcf5
Compare
Roasbeef
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 🪘
| bandwidth = lnwire.NewMSatFromSatoshis( | ||
| channel.Capacity, | ||
| ) | ||
| } else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah ok, so before we'd assume the capacity and potentially try to route over a channel that may not actually be online. This would be caught later in the switch, but it's better to catch this issue earlier.
| "this channel", shortID, bandwidth) | ||
| if err != nil { | ||
| // If the channel is not in our local channels map, use | ||
| // the channel capacity as a fallback. This can happen |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep this is accurate, as this is a per session snapshot. We could try to make this update dynanmically with newly mined channels, but I don't think that's really worth the extra effort.
I analysed a pg related OOM issue and analysing the logs I saw a LOT of this error messages:
what we see above is the reason because we would not distinguish between a channel which is just not in our map or which is currently unusable. That let to a lot of unnecessary retries and pathfinding calls because we would assume full bandwidth when we get this error.
This problem was already documented in 2 TODOs which are now done and removed.
The reason why it only got fixed now is because we changed some logging in this area which now made this problem obvious.
Log-Entries showed like 22000 similar log lines in 5 days (node does a lot of heavy rebalancing)