-
Couldn't load subscription status.
- Fork 420
Penalize failed channels #1144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Penalize failed channels #1144
Conversation
Codecov Report
@@ Coverage Diff @@
## main #1144 +/- ##
==========================================
- Coverage 90.51% 90.42% -0.10%
==========================================
Files 69 70 +1
Lines 35865 39419 +3554
==========================================
+ Hits 32462 35643 +3181
- Misses 3403 3776 +373
Continue to review full report at Codecov.
|
371b971 to
0edc550
Compare
f538e01 to
a90a9e5
Compare
lightning/src/routing/scorer.rs
Outdated
| #[cfg(not(feature = "no-std"))] | ||
| fn decay_from(penalty_msat: u64, last_failure: &SystemTime, decay_interval: Duration) -> u64 { | ||
| let decays = last_failure.elapsed().ok().map_or(0, |elapsed| { | ||
| elapsed.as_secs() / decay_interval.as_secs() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to self: handle divide by zero 😛
lightning-invoice/src/payment.rs
Outdated
| where | ||
| P::Target: Payer, | ||
| R: Router, | ||
| S: routing::Score, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As an alternative to an owned Score, should we consider a "two-layer" trait with a Score and LockedScore where router takes the second?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kinda prefer this way as it just uses plain references. Could be convinced otherwise, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I'm looking at this from two directions:
- in the case of bindings, we cannot serialize from a reference to a trait - all traits have to be concretized into a
dyn Traitand from there you can't (type-safe) go back to the original type to serialize it. We'd have to add support for that, which would be highly language specific and checked at runtime or makeScorerequireWriteable, which is somewhat awkward, though not insane. - in the general case, its a bit awkward for users to have to get their
InvoicePayerto serialize theirScoredata - it ends up dictating a chunk of user layout, instead of taking a reference which gives the user more flexibility.
Obviously its somewhat awkward in that we end up forcing users into a two-layer wrapper thinggy, but luckily:
- Our implementation does works against it without the user having to write any additional characters,
- we can implement the parent trait for
Thing<Deref<Target=Mutex<ThingB: subtrait>>>(or implement it forThing: Deref<Target=subtrait>directly with no-std), making it somewhat transparent at least, - bindings users can use the API, at least kinda, though GC waiting to release the lock kinda sucks too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I'm looking at this from two directions:
- in the case of bindings, we cannot serialize from a reference to a trait - all traits have to be concretized into a
dyn Traitand from there you can't (type-safe) go back to the original type to serialize it. We'd have to add support for that, which would be highly language specific and checked at runtime or makeScorerequireWriteable, which is somewhat awkward, though not insane.
Just a thought, we could have S: routing::Score + Clone and have the accessor return a copy. Seems like it would save a lot of trouble using multiple traits for a small cost. Would this solve the bindings issue at least an interim solution?
- in the general case, its a bit awkward for users to have to get their
InvoicePayerto serialize theirScoredata - it ends up dictating a chunk of user layout, instead of taking a reference which gives the user more flexibility.
FWIW, they will already need to use Arc<InvoicePayer> as it needs to be passed to the BackgroundProcessor for event handling in addition to being used to make payments. So having, say, some persister utility wrapping Arc<InvoicePayer> wouldn't be horrible. And, if written in Rust, wouldn't it just be a simple method call to persist not involving any references at the bindings level?
Obviously its somewhat awkward in that we end up forcing users into a two-layer wrapper thinggy, but luckily:
- Our implementation does works against it without the user having to write any additional characters,
- we can implement the parent trait for
Thing<Deref<Target=Mutex<ThingB: subtrait>>>(or implement it forThing: Deref<Target=subtrait>directly with no-std), making it somewhat transparent at least,- bindings users can use the API, at least kinda, though GC waiting to release the lock kinda sucks too.
I understand the problem and am trying to grok the trait-subtrait solution your proposing. I guess the point of it is so we can pass find_route a LockedScore. But the relationship between LockedScore and Score (which is the supertrait of which?) and what methods each has isn't so clear. Could you elaborate?
Wonder if using an associate type in some manner would make this simpler?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a thought, we could have S: routing::Score + Clone and have the accessor return a copy.
Yea, that's something I'd been thinking about for a different reason (cause java does that a ton anyway), and I guess its okay? I'm not really a huge fan of clone'ing a hashmap that may get an entry for every channel in the graph (which may be the case for users doing probing), though, that could be a lot of data.
FWIW, they will already need to use Arc as it needs to be passed to the BackgroundProcessor for event handling in addition to being used to make payments. So having, say, some persister utility wrapping Arc wouldn't be horrible.
Ah, that's a good point re: user code complexity.
And, if written in Rust, wouldn't it just be a simple method call to persist not involving any references at the bindings level?
So we'd make Score : Writeable and add a utility method to InvoicePayer to just write the scores out directly? I'm not 100% sure where you're going with this.
I understand the problem and am trying to grok the trait-subtrait solution your proposing. I guess the point of it is so we can pass find_route a LockedScore. But the relationship between LockedScore and Score (which is the supertrait of which?) and what methods each has isn't so clear. Could you elaborate?
Sorry, I was using "supertrait" liberally (and incorrectly). What I was thinking of was (but with better naming):
trait Score {
type Scorer: LockedScore;
fn get_scorer(&self) -> Scorer;
}
trait LockedScore {
fn get_score(&self, scid: u64, ..) -> u64;
}
#[cfg(not(no_std))]
impl<LS: LockedScore, T: Deref<Target=Mutex<LockedScore>>> Score for T {
...
}
#[cfg(no_std)]
impl<LS: LockedScore, T: Deref<Target=LockedScore>> Score for T {
...
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmmmm, that could work. I guess it does feel a lot less explicit (in the sense that users lose the flexibility of selecting how locking works) and fairly different from our existing APIs which are built around Derefs.
If we go this route, to make the bindings sane, we'd probably want to create a WriteableScore trait that is just pub trait WriteableScore : Score + Writeable {} and use that, then create a constructor for ScorePersister that mirrors the InvoicePayer constructor and just takes a WriteableScore instead of a Score.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tried your solution with some renaming based on our offline discussion. Ran into some lifetime hell while doing so... 😕 Pushed a commit that almost works. Any chance you could see where I've gone wrong?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, the MutexGuard is making this tricky
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Grrrrrrrrr, right, so the direct thing you want to do here requires GATs, which may land in, like, the next version of rust or something. However, in trying to fix this I learned of a new rust syntax which seems to be exactly the subset of GATs that we want here - HRTB. I pushed a cargo check'ing branch at https://git.bitcoin.ninja/index.cgi?p=rust-lightning;a=log;h=refs/heads/1144-magic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Got it working now. Only annoying thing about the blanket impl is that I get a conflicting implementation error if I try to implement it with T: Deref<Target=RefCell<S>>, which I was hoping to do for tests instead of using a Mutex. Which I guess means any Deref must reference a Mutex? Might be some way to work around it by introducing another type parameter? Don't need to worry about this now, though.
lightning/src/routing/scorer.rs
Outdated
| base_penalty_msat: u64, | ||
| params: ScoringParameters, | ||
| #[cfg(not(feature = "no-std"))] | ||
| channel_failures: HashMap<u64, (u64, SystemTime)>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be an Instant instead, no? We want a monotonic clock.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instant is opaque, so I'd imagine it would be difficult to persist, whereas with SystemTime we can get a Duration since the unix epoch. IIUC, any decrease in time as used with elapsed would be before the first decay, so it shouldn't have any effect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can persist the SystemTime::now() - Instant.elapsed()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmmm... but we can't deserialize it back into Instant since the only way to create one is Instant::now or from another Instant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, right, yuck. Ummmmmmmm Instant::now() - (SystemTime::now() - deserialized_systemtime)? Its gross, but more technically correct...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I think I convinced myself this is possible by serializing in terms of a Duration since the UNIX epoch, and then upon deserialization do a similar adjustment.
let time = std::time::Instant::now();
let duration_since_epoch = std::time::SystemTime::now().duration_since(std::time::SystemTime::UNIX_EPOCH).unwrap();
let serialized_time = duration_since_epoch - time.elapsed();
println!("time: {:?}", time);
println!("duration_since_epoch: {:?}", duration_since_epoch);
println!("serialized_time: {:?}", serialized_time);
std::thread::sleep(std::time::Duration::from_secs(1));
let duration_since_epoch = std::time::SystemTime::now().duration_since(std::time::SystemTime::UNIX_EPOCH).unwrap();
let duration_since_instant = duration_since_epoch - serialized_time;
let deserialized_time = std::time::Instant::now() - duration_since_instant;
println!("duration_since_epoch: {:?}", duration_since_epoch);
println!("duration_since_instant: {:?}", duration_since_instant);
println!("deserialized_time: {:?}", deserialized_time);dbedf14 to
1281d49
Compare
lightning-invoice/src/payment.rs
Outdated
|
|
||
| /// A utility for paying [`Invoice]`s. | ||
| pub struct InvoicePayer<P: Deref, R, L: Deref, E> | ||
| pub struct InvoicePayer<P: Deref, R, S, L: Deref, E> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should S be a Deref to a LockableScore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We implement LockableScore for T: Deref<Target=Mutex<S>>, so I don't think we do unless there is another reason you're thinking of? See last commit for use with BackgroundProcessor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmmm, right, does it work if we implement LockableScore for Mutex<S> instead? It seems like a Deref here is more consistent with our API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Done. Also, implemented for RefCell now that it is possible.
27e7947 to
21cbd03
Compare
lightning-invoice/src/payment.rs
Outdated
| /// Finds a [`Route`] between `payer` and `payee` for a payment with the given values. | ||
| fn find_route( | ||
| &self, payer: &PublicKey, params: &RouteParameters, first_hops: Option<&[&ChannelDetails]> | ||
| fn find_route<S: routing::Score>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The S bound should probably be on the trait itself, no? ie if a user always constructs a InvoicePayer with CustomLocalUserRouter then find_route should take a CustomLocalUserRouter and not a S. If the complexity of the type annotations blows up as a result of this, feel free to ignore :).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I quite follow. Why would find_route take itself (in addition to self) instead of a Score?
I think you're trying to say that the type S: routing::Score parameter used in Router should be the same Score as required by LockableScore<'a>::Locked. The compiler happily leads me to use the following ugly syntax and to use PhantomData in DefaultRouter.
R: for <'a> Router<<<S as Deref>::Target as routing::LockableScore<'a>>::Locked>,But it seems I'm just doing what the compiler is already inferring, no? Did I misunderstand? I suppose internally, we could accidentally call find_route with an entirely different Score than what the user parameterized InvoicePayer with. But I don't think a user could do so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of the current code, it'd be
pub trait Router<S: routing::Score> {
fn find_route(&self, ...&S)
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we're on the same page as discussed offline. Also, turns out I didn't need to use PhantomData. I was being overzealous in where I was adding type parameters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other than the above two comments, LGTM.
21cbd03 to
269a85a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that some of the fixups on Notify scorer of failing payment path probably should be on Parameterize InvoicePayer by routing::Score. I'd be totally fine if those both just get squashed into one commit, though.
lightning/src/routing/mod.rs
Outdated
| } | ||
| } | ||
|
|
||
| impl<S: Score, T: Deref<Target=S> + DerefMut<Target=S>> Score for T { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DerefMut extends Deref so you should be able to drop the Deref bound.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shaping up IMO! Mostly docs requests
| //! | ||
| //! // Or use a custom channel penalty. | ||
| //! let scorer = Scorer::new(1_000); | ||
| //! // Or use custom channel penalties. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since this says custom channel "penalties," could we change it to either say "penalty" or have multiple penalties be custom?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added another custom penalty.
| } | ||
| } | ||
|
|
||
| /// Creates a new scorer using `penalty_msat` as a fixed channel penalty. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could specify that this will only have a fixed base channel penalty
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoops, missed this comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, the other penalty is zero so the overall penalty will be fixed.
| } | ||
|
|
||
| #[test] | ||
| fn scores_failed_channel() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we add a high-level comment? I'm a bit confused what this tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a comment. There is an expectation set on TestScorer upon creation that it is called with a specific short_channel_id. It will fail if it is not called at all or called with a different short_channel_id.
| } | ||
|
|
||
| #[cfg(not(feature = "no-std"))] | ||
| fn decay_from(penalty_msat: u64, last_failure: &Instant, half_life: Duration) -> u64 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
High level doc for the return value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is refactored in #1146 into a method called decayed_penalty, so will leave as is to avoid a lengthy rebase process. Happy to address any concerns in that PR. Probably should include units in the name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha, just noticed a lot of this is rewritten in #1146. Thanks!
|
|
||
| #[cfg(not(feature = "no-std"))] | ||
| fn decay_from(penalty_msat: u64, last_failure: &Instant, half_life: Duration) -> u64 { | ||
| let decays = last_failure.elapsed().as_secs().checked_div(half_life.as_secs()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we test that it won't decay if less than half_life has passed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing is tested at the moment... 🙂 Will do in a follow-up. Confirmed this results in no shifts because checked_div returns Some(0).
269a85a to
5c8466b
Compare
Upon receiving a PaymentPathFailed event, the failing payment may be retried on a different path. To avoid using the channel responsible for the failure, a scorer should be notified of the failure before being used to find a new route. Add a payment_path_failed method to routing::Score and call it in InvoicePayer's event handler. Introduce a LockableScore parameterization to InvoicePayer so the scorer is locked only once before calling find_route.
As payments fail, the channel responsible for the failure may be penalized. Implement Scorer::payment_path_failed to penalize the failed channel using a configured penalty. As time passes, the penalty is reduced using exponential decay, though penalties will accumulate if the channel continues to fail. The decay interval is also configurable.
Proof of concept showing InvoicePayer can be used with an Arc<ChannelManager> passed to BackgroundProcessor. Likely do not need to merge this commit.
5c8466b to
db05a14
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Squashed without diff from Val's ACK, will land after CI:
$ git diff-tree -U1 5c8466b4 db05a14a
$
Adds a new
payment_path_failedmethod torouting::Scorefor penalizing failed channels, which is called byInvoicePayerbefore retrying failed payments in the course of handlingPaymentPathFailedevents.Implements the new method in
Scorerby applying a configurable penalty on top of the base penalty. The new penalty decays over time but may be further increased if the channel continues to fail.TODO: Test new
Scorerbehavior.