Skip to content

Conversation

@max-hoffman
Copy link
Contributor

@max-hoffman max-hoffman commented Dec 19, 2024

BytesBuffer is a class that lets us avoid most allocations for spooling values to wire. Notably, the object is responsible for doubling the backing array size when appropriate, and a Grow(n int) interface is necessary to track when this should happen. Letting the runtime do all of this would be preferable, but the runtime doubles based on slice size, and the refactors required to make that workable are more invasive. We pay for 2 mallocs on doubling, because the first one is never big enough. Not calling Grow after allocing, or growing by too small of length compared to the allocations used will stomp previously written memory.

As long as we track bytes used with the Grow interface this works smoothly and shaves ~30% off of tablescans.

perf here: dolthub/dolt#8693

@max-hoffman max-hoffman changed the title session write buffer reuse wire write buffer Dec 19, 2024
@max-hoffman max-hoffman changed the title reuse wire write buffer pool wire write buffer Dec 19, 2024
Copy link
Member

@zachmu zachmu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Idea is sound, see comments

},
}

type ByteBuffer struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This idea has merit and is surely better than allocating a buffer for each request but the way you're managing the memory is suboptimal. Also good to use the same backing array for multiple values in a request.

In the main use of this object in the handler, you're getting a zero-length slice (which has some larger backing array) and then calling append on it byte by byte. This will grow the backing array in some cases, but it's not being done under your deliberate control. Rather, you're then calling Double if the backing array is low on space after the appends have already happened.

Basically: in these methods, you are referring to the len of the byte slice, when your concern is usually the cap. It's fine to let append happen byte by byte as long as they aren't doubling the backing array too often, that's the expensive bit.

I think this would probably work slightly better if you just scrapped the explicit capacity management altogether and just let the Go runtime manage it automatically for you. Either that, or always manage it explicitly yourself, i.e. before you serialize a value with all those append calls. But it's not clear to me that manual management is actually any better if you use the same strategy as the go runtime does (double once we're full).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with all of this, but there are two caveats that limit our ability to let the runtime handle this for us. (1) The runtime chooses doubling based on the cap of the slice, not the full backing array. So for the current setup, the doubled array is usually actually smaller than the original backing array. (2) Doubled arrays are not reference swapped, we need a handle to the new buffer to know when to grow.

I'm not aware of how to override the default runtime growth behavior to ignore the slice cap and instead double based on the backing array cap. SoBytesBuffer still does the doubling, and a Grow(n int) interface to track when this should happen. We pay for 2 mallocs on doubling, because the first one is never big enough. Not calling Grow after allocing, or growing by too small of length compared to the allocations used will stomp previously written memory.

Copy link
Member

@zachmu zachmu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, although a few tests of using the Get() / Grow() sequence across some boundaries would be great.

// are responsible for accurately reporting which bytes
// they expect to be protected.
func (b *ByteBuffer) Grow(n int) {
if b.i+n > len(b.buf) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually is there an off-by-one error here? Test at the boundary would be goo

Copy link
Contributor Author

@max-hoffman max-hoffman Dec 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, not an error maybe but definitely better to preemptively double when we know the next Grow will exceed the buffer

@max-hoffman max-hoffman merged commit 999a371 into main Dec 20, 2024
8 checks passed
@max-hoffman max-hoffman deleted the max/session-write-buffer branch December 20, 2024 23:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants