-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Attributable failures #9888
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Attributable failures #9888
Conversation
|
Important Review skippedAuto reviews are limited to specific labels. 🏷️ Labels to auto review (1)
Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
f713f3c to
6640a01
Compare
|
Bumped lightning-onion and rebased on |
6640a01 to
1dd8765
Compare
|
The linter failure is only about the |
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces support for attributable failures, which is a significant and complex change to the error handling and encryption logic. The core refactoring to separate shared secret extraction from error encrypter creation is well-executed and improves the design.
I've found a few issues related to the implementation:
- An incorrect unit calculation for HTLC hold times, which deviates from the BOLT specification.
- A logic error in handling failures for introduction nodes in a blinded path, where the wrong message type is created and attribution data is discarded.
- A minor opportunity for code simplification in the new logging logic.
Overall, the changes are well-structured, but the identified issues, particularly the correctness ones, should be addressed. The PR description is clear about the work-in-progress nature, and this review should help in finalizing the implementation.
Co-authored-by: Joost Jager <[email protected]>
Preparation for the instantiation of an attributable error encrypter. This gets rid of the sphinx encrypter instantiation in OnionProcessor. This would otherwise be problematic when an attributable encrypter would need to be created there without having access to the error structure. Co-authored-by: Joost Jager <[email protected]>
We add some simple helpers for the purpose of parsing the attribution data from and to the wire format.
When propagating a failure backwards along a route we want to persist the received ExtraData of the downstream link so that any handling can occur on our node before returning the updated ExtraData to our upstream link.
We now call into the new encryption/decryption methods and provide the received attribution data as an argument. The result is then also returned to our upstream peer.
1dd8765 to
1fc33bd
Compare
|
Rebased |
|
@GeorgeTsagk, remember to re-request review from reviewers when ready |
Description
Adds support for Attributable failures (feature 36/37). This changes the error encryption and the data encoding for the
update_fail_htlcmessage. We now include the attribution structure in the extra data of the peer message, which helps with assigning blame across the route and also extracting the reported HTLC hold times for each node across the route.For now we default to using non-strict attribution decryption, which means that we're basically preserving the old behavior in order to give time to nodes to upgrade. Otherwise this could lead to premature blame of our direct peers, possibly blacklisting all of our channels early in the payment lifecycle.
This is based on the lightning-onion PR which is also updated to reflect the latest state of attributable failures Bolt PR.
Testing
Current unit/integration tests seem to be passing, which validates that we are preserving the old behavior.
Todo:
Even though the todo list above is important to consider this PR ready to merge, this PR at current state is still good for a first round of review.