-
Notifications
You must be signed in to change notification settings - Fork 2.2k
switchrpc: allow remote cleanup of Switch attempt store #10484
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: elle-base-branch-payment-service
Are you sure you want to change the base?
switchrpc: allow remote cleanup of Switch attempt store #10484
Conversation
Summary of ChangesHello @calvinrzachman, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a new remote procedure call (RPC) to the Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a CleanStore RPC to the switchrpc sub-system, allowing for remote cleanup of the Switch's attempt store. The implementation adds a coarse-grained lock to the networkResultStore to prevent race conditions between the new cleanup operation and other per-attempt operations. The changes are well-tested with an expanded integration test that covers the full idempotency lifecycle of a payment attempt.
My review focuses on two main points:
- The locking strategy in
htlcswitch/payment_result.goappears to be overly restrictive, potentially impacting performance. I've suggested using read locks instead of write locks for several per-attempt operations to allow for more concurrency. - The
CleanStoreRPC response includes adeleted_attemptsfield which is currently not populated. I've proposed changes to return this count from the underlying implementation to make the RPC more informative for the caller.
Overall, this is a solid addition with a very detailed description of the trade-offs. Addressing the feedback will improve performance and the usability of the new RPC.
This will cleanup information on attempts from the Switch's network result/attempt store. NOTE: Until the HTLC attempt ID space is separated, it is only safe to allow a *single* router (whether local or remote) to dispatch and cleanup HTLC attempts.
This prevents CleanStore from running concurrently with other store operations.
Demonstrate the behavior of the duplicate protection for attempt IDs with respect to the full life-cycle of a payment attempt - from SendOnion, initialization, dispatch, receipt of a final result (settle/fail) from the network, and the interaction with CleanStore.
b030a13 to
aa191d9
Compare
Change Description
We expand the
switchrpcserver to include support for remote deletion of attempt information within the Switch's underlying attempt stores. In creating this PR, we have conceptualized two potential behaviors for deletion:CleanStore(keepSet []ID): deletes everything except that which it's told to keep.DeleteAttempts(deleteSet []ID): deletes only what is explicitly specified to delete.There are some differences to the two approaches, but for now, we have opted to retain the approach used by lnd historically via the implementation of a CleanStore endpoint. Remote entities responsible for payment life-cycle management which submit onion payments for direct delivery to the network can use this RPC to perform cleanup of HTLC attempt information when the attempts are no longer in-flight on the network.
This change expands the
switchrpcserver to include aCleanStoreRPC. The client provides a set of all its known in-flight attempt IDs, and the server deletes all other results from the store.Extends: #9489
Warnings on Use
The
CleanStore(keepSet)RPC operates on a "snapshot-and-delete" model, which creates a somewhat fragile contract between the client and server. Any client making use of this endpoint needs to contend with the following issues:Stale
keepSetRace ConditionThis occurs when the client's view of its own in-flight payments becomes outdated between the time it builds its
keepSetand when the server executes theCleanStorecommand.keepSet. At this moment, X has not yet been durably registered.CleanStoreRPC with the now-stalekeepSet(which is missing X).SendOnionfor X, initializing it in its database.CleanStoreRPC and, because X is not in thekeepSet, deletes it.Network Request Re-ordering
A
CleanStore(keepSet)style RPC mandates strict client-side enforcement of a quiescent state to prevent race conditions between deletion requests and the creation/dispatch of new payment attempts. A client cannot tolerate uncertainty in backend responses and proceed to exit the quiescent state. This carries the potential for request re-ordering, which risks deleting active payment attempts. Such strict adherence to a quiescent state harms the availability of the service, as the client must pause normal operation until all backend state is reconciled.The contract requires that a client be (1) synchronous in performing its cleanup during a quiescent state (e.g., on startup) and that it (2) durably writes ahead intent to dispatch an attempt prior to actually doing so. The first condition mirrors the behavior of the
ChannelRouter's resumePayments function. The second condition ensures that anySendOnionrequests made prior to a client crash would be included in thekeepSetcreated following a restart and the client avoids issue with re-ordering ofSendOnionandCleanStorerequest processing on the server.Single Router Limitation
NOTE: It is not safe to have multiple HTLC dispatching entities independently calling CleanStore without strict coordination. In a multi-client setting, the
CleanStoreapproach to deletion requires an explicit identity-aware mapping of attempt IDs on the server. Without this, the Switch cannot safely differentiate and delete only a specific client's records. It cannot work with multiple clients sharing the same flat attempt ID space.Alternative/Future Work
DeleteAttempts(deleteSet []ID)RPC for explicit deletion is likely the architecturally superior long-term solution for managing payment attempt state. Its primary benefit is enabling true "on-line" deletion, allowing the client to continuously and concurrently garbage collect terminal payment attempts while still actively sending and managing new payments. This completely eliminates the need for cleanup to occur during a disruptive, "quiescent" state at client startup. Advantages of this approach include: