-
Notifications
You must be signed in to change notification settings - Fork 392
perf(relayer): optimize timeout processing with drain and Vec #4426
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Heavy refactoring necessary.
P.S.:
Leaving this review as a member of the @nolus-protocol dev team.
| use std::str::FromStr; | ||
| use std::time::Duration; | ||
|
|
||
| use criterion::{black_box, criterion_group, criterion_main, BatchSize, BenchmarkId, Criterion}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
criterion::black_box is deprecated and says to use std::hint::black_box instead.
| } = gm; | ||
|
|
||
| match &event_with_height.event { | ||
| for gm in odata.batch.drain(..) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Capacity is not reused later and is instead replaced (L1739, R1737) with an empty one, which is also pure computationally (const fn) and does not allocate.
That can be done via std::mem::take and will allow the current or future versions of std::vec::Vec to take advantage of the fact that the whole container is consumed.
| for gm in odata.batch.drain(..) { | |
| for gm in std::mem::take(&mut odata.batch) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using Vec::with_capacity there is intentional: we need to take ownership of every TransitMessage so we can move “timed-out” ones into timed_out and keep only the survivors.
The current flow (for gm in odata.batch.drain(..) { … } followed by odata.batch = retain_batch;) gives us owned gms and preserves ordering; the price is allocating a fresh buffer for retain_batch.
Replacing this with std::mem::take doesn’t actually remove that allocation: once you take the batch you still need a second vector to store the retained items (because retain_mut/retain only hand out &mut references, which we can’t move), so you’d end up allocating again for that second vector or cloning messages.
In short, there’s no practical win here—mem::take can’t simultaneously give us ownership of the drained messages and let us reuse the original buffer without deeper restructuring, so the current approach at relay_path.rs (lines 1688-1739) is fine as-is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sorry but I'll have to strongly disagree. Please look at the signature and provided documentation of std::mem::take. It temporarily needs the given mutable reference, replacing it with Default::default(), and giving the old result. It's a convenience wrapper around std::mem::replace.
The code base is already in unacceptable state and we at @nolus-protocol are currently rewriting parts of it, with possibility for upstreaming, so I personally would rather not add more unnecessary complexity.
I have already removed the need for retain_batch as it's very error-prone and directly add back to odata.batch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leaving our implementation that can be used as a reference and proof of concept that this approach works.
Link to the commit: nolus-protocol/hermes@52479c3
| let mut all_dst_odata = self.dst_operational_data.clone_vec(); | ||
|
|
||
| let mut timed_out: HashMap<usize, OperationalData> = HashMap::default(); | ||
| let mut timed_out: Vec<Option<OperationalData>> = vec![None; all_dst_odata.len()]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[timed_out 1/4]
This will only waste computational time and resources as it isn't necessary to allocate it with the same length. On top of that, allocating it with None instances also redundant as all_dst_odata is traversed sequentially from start to end and elements can just be inserted to the back.
| let mut timed_out: Vec<Option<OperationalData>> = vec![None; all_dst_odata.len()]; | |
| let mut timed_out = vec![]; |
If proven necessary, it could be allocated with max(1, half-capacity), as I believe that it's highly unlikely during normal circumstances to get past half-capacity.
let mut timed_out = Vec::with_capacity(1.max(all_dst_odata.len() >> 1));| let mut timed_out: Vec<Option<OperationalData>> = vec![None; all_dst_odata.len()]; | ||
|
|
||
| // For each operational data targeting the destination chain... | ||
| for (odata_pos, odata) in all_dst_odata.iter_mut().enumerate() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[timed_out 2/4]
| for (odata_pos, odata) in all_dst_odata.iter_mut().enumerate() { | |
| for odata in &mut all_dst_odata { |
| let slot = timed_out | ||
| .get_mut(odata_pos) | ||
| .expect("timed_out vector is sized to match operational data"); | ||
| slot.get_or_insert_with(|| { | ||
| OperationalData::new( | ||
| dst_current_height, | ||
| OperationalDataTarget::Source, | ||
| odata_tracking_id, | ||
| self.channel.connection_delay, | ||
| ) | ||
| }) | ||
| .push(TransitMessage { | ||
| event_with_height: gm.event_with_height.clone(), | ||
| msg: new_msg, | ||
| }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[timed_out 3/4]
| let slot = timed_out | |
| .get_mut(odata_pos) | |
| .expect("timed_out vector is sized to match operational data"); | |
| slot.get_or_insert_with(|| { | |
| OperationalData::new( | |
| dst_current_height, | |
| OperationalDataTarget::Source, | |
| odata_tracking_id, | |
| self.channel.connection_delay, | |
| ) | |
| }) | |
| .push(TransitMessage { | |
| event_with_height: gm.event_with_height.clone(), | |
| msg: new_msg, | |
| }); | |
| let mut odata = OperationalData::new( | |
| dst_current_height, | |
| OperationalDataTarget::Source, | |
| odata_tracking_id, | |
| self.channel.connection_delay, | |
| ); | |
| odata.push(TransitMessage { msg: new_msg, ..gm }); | |
| timed_out.push(odata); |
|
|
||
| // Schedule new operational data targeting the source chain | ||
| for (_, new_od) in timed_out.into_iter() { | ||
| for new_od in timed_out.into_iter().flatten() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[timed_out 4/4]
| for new_od in timed_out.into_iter().flatten() { | |
| for new_od in timed_out { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This micro-benchmark tests the difference between std::vec::Vec and std::collections::HashMap as it does NOT test against the used implementation but rather a dummy out-of-sync version of the algorithm.
That makes this micro-benchmark wholly redundant as the algorithm I have provided doesn't allocate without necessity AND does not allocate the whole index range.
Closes: #XXX
Description
The optimization provides 40-44% performance improvement across all batch sizes by:
Replacing HashMap with Vec for sequential index access
Eliminating cloning by using drain() instead of iter() + clone()
Pre-allocating vector capacity
PR author checklist:
unclog.docs/).mircea-cReviewer checklist:
Files changedin the GitHub PR explorer.