Optimise read buf initialization performance#524
Optimise read buf initialization performance#524alexheretic wants to merge 3 commits intosnapview:masterfrom
Conversation
979d592 to
8e0f19d
Compare
| impl AsRef<[u8]> for InitAwareBuf { | ||
| #[inline] | ||
| fn as_ref(&self) -> &[u8] { | ||
| &self.bytes | ||
| } | ||
| } | ||
|
|
||
| impl Deref for InitAwareBuf { | ||
| type Target = [u8]; | ||
|
|
||
| #[inline] | ||
| fn deref(&self) -> &[u8] { | ||
| &self.bytes | ||
| } | ||
| } | ||
|
|
||
| impl AsMut<[u8]> for InitAwareBuf { | ||
| #[inline] | ||
| fn as_mut(&mut self) -> &mut [u8] { | ||
| &mut self.bytes | ||
| } | ||
| } | ||
|
|
||
| impl DerefMut for InitAwareBuf { | ||
| #[inline] | ||
| fn deref_mut(&mut self) -> &mut [u8] { | ||
| &mut self.bytes | ||
| } | ||
| } |
There was a problem hiding this comment.
I came from the bytes issue. It's taking me some time to review this because of the manual slicing that happens in the other files. I'm not a fan of them, because in a way they leak implementation internals and leave the other modules to do the slicing by themselves. Although it's an internal API, it doesn't feel robust.
Why not copy the read_buf API (both the initial one which you can see in tokio, and the current one in std) by having methods like:
impl InitAwareBuf {
// the region of memory that contains user data
pub fn filled(&self) -> &[u8] {}
// the region of memory that does not contain user data, but has been initialized
pub fn init_mut(&mut self) -> &mut [u8] {}
// mark `filled_len` bytes, that were written into the slice returned by `init_mut`, as filled
pub fn advance_mut(&mut self, filled_len: usize) {}
}There was a problem hiding this comment.
I envisioned the wrapper as transparent bytes but with extra info, the initialised capacity. Slice access, even mut, is fine as it doesn't grow or shrink the buffer.
This style simplifies the overall change as usage of the buf, previously plain BytesMut is fairly unchanged.
We just need ensure the new wrapper itself is sound.
|
I updated the benchmarks in the description "init-aware-buf2" to reflect the reworked implementation. The conclusions are the same. |
daniel-abramov
left a comment
There was a problem hiding this comment.
Thanks @alexheretic for working on these optimizations. And thanks @paolobarbolini for reviewing it!
I have not spotted any issues so far, the only thing I'm wondering about is whether the additional complexity would result in noticeable performance improvements: judging by benchmarks, I see that while there is noticeable performance improvement when the read buffer size is set to 8 MiB, but at the same time it looks like increasing the buffer size to 8 MiB is not really that useful even for 1 GiB benchmark, because the improved buffer with 8 MiB read buffer has a similar performance as the master buffer with the default read buffer size.
P.S.: Btw, I updated the rust-version in master so that CI/CD does not fail. You might want to rebase :)
Particularly improves large read_buffer_size performance
Co-authored-by: Daniel Abramov <inetcrack2@gmail.com>
2345f5f to
ef3ea34
Compare
Yes I agree with this analysis. It is also kinda why I didn't rush to make this optimisation initially. The optimisation itself does make sense, but we're missing a compelling use case for it. Considering the added complexity I'm ok with keeping this PR unmerged until there is some better use case that benefits. On the other hand perhaps fixing perf for large configured buffers is desirable. Or perhaps could be later if we figure out why large messages perf is worse than 256KiB message perf. I'm ok with whatever you want to do with this. |
Add new
InitAwareBufwrapper logic that optimises repetitive zero-initialization of the read buffer when receiving messages. Particularly improves larger than defaultread_buffer_sizeperformance.Also see previous analysis.
This optimisation works by
InitAwareBuf(BytesMut)keeping track of how much of the spare capacity has been previously initialised. This means when weresize + read + truncatewe need only actually zero the necessary region of uninitialized bytes once.Alternatives
I didn't find many better options than this custom optimisation wrapping
BytesMut. I asked upstream and they don't plan on providing anything for this.Also related if/when we get rust-lang/rust#78485 we can probably switch to using that and remove the
InitAwareBufwrapper.Benchmarks
Benchmarks using the default 128KiB read buffer don't really change. The improvement is clear though if we set a large, e.g. 8MiB,
read_buffer_sizeas this amplifies the amount of zeroing the current logic does.So we could see this as a kind of performance fix for larger buffers. In theory it should optimise the default 128KiB buffer for small messages too, I just don't see it come across in our current benches.
Default read buffer (128KiB)
No noticeable difference.
8MiB read buffer
A significant improvement fixing the performance regression of using larger buffers.