Skip to content

Conversation

@calvinrzachman
Copy link
Contributor

Change Description

We expand the switchrpc server to include support for remote deletion of attempt information within the Switch's underlying attempt stores. In creating this PR, we have conceptualized two potential behaviors for deletion:

  1. CleanStore(keepSet []ID): deletes everything except that which it's told to keep.
  2. DeleteAttempts(deleteSet []ID): deletes only what is explicitly specified to delete.

There are some differences to the two approaches, but for now, we have opted to retain the approach used by lnd historically via the implementation of a CleanStore endpoint. Remote entities responsible for payment life-cycle management which submit onion payments for direct delivery to the network can use this RPC to perform cleanup of HTLC attempt information when the attempts are no longer in-flight on the network.

This change expands the switchrpc server to include a CleanStore RPC. The client provides a set of all its known in-flight attempt IDs, and the server deletes all other results from the store.

Extends: #9489

Warnings on Use

The CleanStore(keepSet) RPC operates on a "snapshot-and-delete" model, which creates a somewhat fragile contract between the client and server. Any client making use of this endpoint needs to contend with the following issues:

Stale keepSet Race Condition

This occurs when the client's view of its own in-flight payments becomes outdated between the time it builds its keepSet and when the server executes the CleanStore command.

  1. A client's payment-sending logic decides to initiate a new attempt, X.
  2. Simultaneously, a cleanup process on the client queries the list of active payments to build a keepSet. At this moment, X has not yet been durably registered.
  3. The cleanup process sends the CleanStore RPC with the now-stale keepSet (which is missing X).
  4. The server processes the SendOnion for X, initializing it in its database.
  5. The server then processes the CleanStore RPC and, because X is not in the keepSet, deletes it.

Network Request Re-ordering

A CleanStore(keepSet) style RPC mandates strict client-side enforcement of a quiescent state to prevent race conditions between deletion requests and the creation/dispatch of new payment attempts. A client cannot tolerate uncertainty in backend responses and proceed to exit the quiescent state. This carries the potential for request re-ordering, which risks deleting active payment attempts. Such strict adherence to a quiescent state harms the availability of the service, as the client must pause normal operation until all backend state is reconciled.

The contract requires that a client be (1) synchronous in performing its cleanup during a quiescent state (e.g., on startup) and that it (2) durably writes ahead intent to dispatch an attempt prior to actually doing so. The first condition mirrors the behavior of the ChannelRouter's resumePayments function. The second condition ensures that any SendOnion requests made prior to a client crash would be included in the keepSet created following a restart and the client avoids issue with re-ordering of SendOnion and CleanStore request processing on the server.

Single Router Limitation

NOTE: It is not safe to have multiple HTLC dispatching entities independently calling CleanStore without strict coordination. In a multi-client setting, the CleanStore approach to deletion requires an explicit identity-aware mapping of attempt IDs on the server. Without this, the Switch cannot safely differentiate and delete only a specific client's records. It cannot work with multiple clients sharing the same flat attempt ID space.

Alternative/Future Work

  • A DeleteAttempts(deleteSet []ID) RPC for explicit deletion is likely the architecturally superior long-term solution for managing payment attempt state. Its primary benefit is enabling true "on-line" deletion, allowing the client to continuously and concurrently garbage collect terminal payment attempts while still actively sending and managing new payments. This completely eliminates the need for cleanup to occur during a disruptive, "quiescent" state at client startup. Advantages of this approach include:
    • avoid increased restart time otherwise taken to cleanup state from remote Switch RPC servers.
    • no new mutex: can be implemented using existing fine-grained multi-mutex.
    • enhanced multi-client safety: a simpler API contract that inherently prevents race conditions in a distributed environment, without relying on client-side quiescent state enforcement.

@gemini-code-assist
Copy link

Summary of Changes

Hello @calvinrzachman, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a new remote procedure call (RPC) to the switchrpc server, allowing external payment lifecycle managers to perform cleanup operations on HTLC attempt data. The new CleanStore RPC facilitates the removal of stale or completed payment attempt records, improving the efficiency and state management of the Switch. The implementation includes robust concurrency controls to ensure data integrity during cleanup and is accompanied by comprehensive tests to validate its behavior and interaction with existing payment idempotency features.

Highlights

  • New RPC Endpoint: CleanStore: A new CleanStore RPC has been added to the switchrpc server, enabling remote entities to clean up HTLC attempt information within the Switch's underlying attempt stores. This RPC allows clients to specify a keepSet of attempt IDs, and all other records will be deleted.
  • Concurrency Control for Store Operations: A sync.RWMutex named storeMtx has been introduced in htlcswitch/payment_result.go to protect the entire networkResultStore during global operations like CleanStore. Fine-grained per-attempt operations now acquire read or write locks on this mutex to prevent race conditions.
  • Enhanced Idempotency Testing: The integration test testSendOnionTwice has been refactored and renamed to testSendOnionIdempotencyLifecycle. This expanded test now thoroughly verifies the full lifecycle of SendOnion's idempotency guarantees, including scenarios where CleanStore is used to explicitly preserve and then delete attempt records.
  • Warnings on CleanStore Usage: The CleanStore RPC operates on a 'snapshot-and-delete' model, which introduces potential race conditions (stale keepSet), issues with network request re-ordering, and a limitation to a single HTLC dispatching entity for safe operation. These warnings are explicitly documented.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a CleanStore RPC to the switchrpc sub-system, allowing for remote cleanup of the Switch's attempt store. The implementation adds a coarse-grained lock to the networkResultStore to prevent race conditions between the new cleanup operation and other per-attempt operations. The changes are well-tested with an expanded integration test that covers the full idempotency lifecycle of a payment attempt.

My review focuses on two main points:

  1. The locking strategy in htlcswitch/payment_result.go appears to be overly restrictive, potentially impacting performance. I've suggested using read locks instead of write locks for several per-attempt operations to allow for more concurrency.
  2. The CleanStore RPC response includes a deleted_attempts field which is currently not populated. I've proposed changes to return this count from the underlying implementation to make the RPC more informative for the caller.

Overall, this is a solid addition with a very detailed description of the trade-offs. Addressing the feedback will improve performance and the usability of the new RPC.

@saubyk saubyk added this to v0.21 Jan 8, 2026
@saubyk saubyk moved this to In progress in v0.21 Jan 8, 2026
This will cleanup information on attempts from the
Switch's network result/attempt store.

NOTE: Until the HTLC attempt ID space is separated,
it is only safe to allow a *single* router (whether
local or remote) to dispatch and cleanup HTLC attempts.
This prevents CleanStore from running concurrently
with other store operations.
Demonstrate the behavior of the duplicate protection
for attempt IDs with respect to the full life-cycle
of a payment attempt - from SendOnion, initialization,
dispatch, receipt of a final result (settle/fail) from
the network, and the interaction with CleanStore.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In progress

Development

Successfully merging this pull request may close these issues.

1 participant