Skip to content

Optimise read buf initialization performance#524

Open
alexheretic wants to merge 3 commits intosnapview:masterfrom
alexheretic:init-aware-read-buf
Open

Optimise read buf initialization performance#524
alexheretic wants to merge 3 commits intosnapview:masterfrom
alexheretic:init-aware-read-buf

Conversation

@alexheretic
Copy link
Contributor

@alexheretic alexheretic commented Nov 23, 2025

Add new InitAwareBuf wrapper logic that optimises repetitive zero-initialization of the read buffer when receiving messages. Particularly improves larger than default read_buffer_size performance.

Also see previous analysis.

This optimisation works by InitAwareBuf(BytesMut) keeping track of how much of the spare capacity has been previously initialised. This means when we resize + read + truncate we need only actually zero the necessary region of uninitialized bytes once.

Alternatives

I didn't find many better options than this custom optimisation wrapping BytesMut. I asked upstream and they don't plan on providing anything for this.

Also related if/when we get rust-lang/rust#78485 we can probably switch to using that and remove the InitAwareBuf wrapper.

Benchmarks

Benchmarks using the default 128KiB read buffer don't really change. The improvement is clear though if we set a large, e.g. 8MiB, read_buffer_size as this amplifies the amount of zeroing the current logic does.

So we could see this as a kind of performance fix for larger buffers. In theory it should optimise the default 128KiB buffer for small messages too, I just don't see it come across in our current benches.

Default read buffer (128KiB)

No noticeable difference.

group                init-aware-buf2                        master
-----                ---------------                        ------
send+recv/512 B      1.00     12.8±0.16µs    76.5 MB/sec    1.00     12.8±0.03µs    76.2 MB/sec
send+recv/4 KiB      1.00     14.7±0.21µs   529.8 MB/sec    1.02     15.1±0.38µs   518.9 MB/sec
send+recv/32 KiB     1.07     28.2±0.18µs     2.2 GB/sec    1.00     26.3±0.19µs     2.3 GB/sec
send+recv/256 KiB    1.08    115.3±0.13µs     4.2 GB/sec    1.00    106.8±0.97µs     4.6 GB/sec
send+recv/2 MiB      1.00   937.5±36.06µs     4.2 GB/sec    1.00   940.9±29.48µs     4.2 GB/sec
send+recv/16 MiB     1.09     15.4±0.42ms     2.0 GB/sec    1.00     14.2±0.41ms     2.2 GB/sec
send+recv/128 MiB    1.00    196.7±0.66ms  1301.6 MB/sec    1.00    197.2±7.01ms  1298.3 MB/sec
send+recv/1 GiB      1.07  1344.7±50.32ms  1523.1 MB/sec    1.00  1262.0±60.64ms  1622.9 MB/sec

8MiB read buffer

A significant improvement fixing the performance regression of using larger buffers.

group                init-aware-buf2-8mb                    master-8mb
-----                -------------------                    ----------
send+recv/512 B      1.00     12.5±0.25µs    77.9 MB/sec    6.12     76.6±2.42µs    12.7 MB/sec
send+recv/4 KiB      1.00     15.1±0.06µs   518.9 MB/sec    2.97     44.7±0.06µs   174.9 MB/sec
send+recv/32 KiB     1.00     26.8±0.17µs     2.3 GB/sec    1.91     51.2±1.27µs  1221.2 MB/sec
send+recv/256 KiB    1.00    120.6±1.32µs     4.0 GB/sec    1.55    187.0±2.86µs     2.6 GB/sec
send+recv/2 MiB      1.00  1125.5±33.24µs     3.5 GB/sec    1.05   1177.8±4.16µs     3.3 GB/sec
send+recv/16 MiB     1.06     14.5±0.09ms     2.2 GB/sec    1.00     13.7±0.07ms     2.3 GB/sec
send+recv/128 MiB    1.00    189.9±0.66ms  1347.8 MB/sec    1.04    197.7±2.72ms  1294.6 MB/sec
send+recv/1 GiB      1.00  1286.7±26.05ms  1591.6 MB/sec    1.00  1285.7±26.50ms  1592.9 MB/sec

Comment on lines 112 to 140
impl AsRef<[u8]> for InitAwareBuf {
#[inline]
fn as_ref(&self) -> &[u8] {
&self.bytes
}
}

impl Deref for InitAwareBuf {
type Target = [u8];

#[inline]
fn deref(&self) -> &[u8] {
&self.bytes
}
}

impl AsMut<[u8]> for InitAwareBuf {
#[inline]
fn as_mut(&mut self) -> &mut [u8] {
&mut self.bytes
}
}

impl DerefMut for InitAwareBuf {
#[inline]
fn deref_mut(&mut self) -> &mut [u8] {
&mut self.bytes
}
}
Copy link

@paolobarbolini paolobarbolini Nov 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I came from the bytes issue. It's taking me some time to review this because of the manual slicing that happens in the other files. I'm not a fan of them, because in a way they leak implementation internals and leave the other modules to do the slicing by themselves. Although it's an internal API, it doesn't feel robust.

Why not copy the read_buf API (both the initial one which you can see in tokio, and the current one in std) by having methods like:

impl InitAwareBuf {
    // the region of memory that contains user data
    pub fn filled(&self) -> &[u8] {}

    // the region of memory that does not contain user data, but has been initialized
    pub fn init_mut(&mut self) -> &mut [u8] {}

    // mark `filled_len` bytes, that were written into the slice returned by `init_mut`, as filled
    pub fn advance_mut(&mut self, filled_len: usize) {}
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I envisioned the wrapper as transparent bytes but with extra info, the initialised capacity. Slice access, even mut, is fine as it doesn't grow or shrink the buffer.

This style simplifies the overall change as usage of the buf, previously plain BytesMut is fairly unchanged.

We just need ensure the new wrapper itself is sound.

@alexheretic
Copy link
Contributor Author

I updated the benchmarks in the description "init-aware-buf2" to reflect the reworked implementation. The conclusions are the same.

Copy link
Member

@daniel-abramov daniel-abramov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @alexheretic for working on these optimizations. And thanks @paolobarbolini for reviewing it!

I have not spotted any issues so far, the only thing I'm wondering about is whether the additional complexity would result in noticeable performance improvements: judging by benchmarks, I see that while there is noticeable performance improvement when the read buffer size is set to 8 MiB, but at the same time it looks like increasing the buffer size to 8 MiB is not really that useful even for 1 GiB benchmark, because the improved buffer with 8 MiB read buffer has a similar performance as the master buffer with the default read buffer size.

P.S.: Btw, I updated the rust-version in master so that CI/CD does not fail. You might want to rebase :)

alexheretic and others added 3 commits January 12, 2026 15:19
Particularly improves large read_buffer_size performance
Co-authored-by: Daniel Abramov <inetcrack2@gmail.com>
@alexheretic
Copy link
Contributor Author

I see that while there is noticeable performance improvement when the read buffer size is set to 8 MiB, but at the same time it looks like increasing the buffer size to 8 MiB is not really that useful even for 1 GiB benchmark, because the improved buffer with 8 MiB read buffer has a similar performance as the master buffer with the default read buffer size.

Yes I agree with this analysis. It is also kinda why I didn't rush to make this optimisation initially. The optimisation itself does make sense, but we're missing a compelling use case for it. Considering the added complexity I'm ok with keeping this PR unmerged until there is some better use case that benefits.

On the other hand perhaps fixing perf for large configured buffers is desirable. Or perhaps could be later if we figure out why large messages perf is worse than 256KiB message perf.

I'm ok with whatever you want to do with this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants