refactor(protobuf): replace chunk-based BinaryWriter with growable Uint8Array buffer and in-place varint writes#1108
Conversation
…nt8Array buffer and in-place varint writes
|
Thanks for the PR! We'll allocate time to give this a closer look. |
timostamm
left a comment
There was a problem hiding this comment.
Left a couple of comments below.
I like this change - it's a bit cleaner, and should also make it easier to move to resizable array buffers in the future.
Looking at perf:
# before
toBinary perf-payload.bin x 5,680 ops/sec ±0.33% (96 runs sampled)
toBinary tiny example.User x 1,176,788 ops/sec ±0.19% (100 runs sampled)
toBinary normal example.User x 203,325 ops/sec ±0.54% (94 runs sampled)
toBinary scalar values x 292,358 ops/sec ±0.65% (98 runs sampled)
toBinary repeated scalar values x 101,041 ops/sec ±0.57% (96 runs sampled)
toBinary map with scalar keys and values x 69,991 ops/sec ±1.12% (99 runs sampled)
toBinary repeated field with 1000 messages x 3,812 ops/sec ±2.65% (96 runs sampled)
toBinary map field with 1000 messages x 771 ops/sec ±2.20% (94 runs sampled)
# after
toBinary perf-payload.bin x 5,162 ops/sec ±0.33% (99 runs sampled)
toBinary tiny example.User x 1,252,113 ops/sec ±0.50% (94 runs sampled)
toBinary normal example.User x 244,426 ops/sec ±1.18% (92 runs sampled)
toBinary scalar values x 353,611 ops/sec ±0.45% (99 runs sampled)
toBinary repeated scalar values x 129,307 ops/sec ±0.43% (99 runs sampled)
toBinary map with scalar keys and values x 89,141 ops/sec ±0.46% (96 runs sampled)
toBinary repeated field with 1000 messages x 7,059 ops/sec ±0.29% (100 runs sampled)
toBinary map field with 1000 messages x 1,126 ops/sec ±0.22% (98 runs sampled)
# ran with
cd packages/protobuf-test
npx turbo run build
npx tsx src/perf.ts benchmark 'toBinary'
Nice improvement overall, with a 10% regression on perf-payload.bin. We've used this case for performance optimization in the past (for example #836), so it's unfortunate that they are getting slower with this change.
I think the payload fields repeated_long_string_field and repeated_long_bytes_field (see perf-payload.txt) are responsible. Would be great to understand why, and whether it can be improved.
| /** | ||
| * Writes a tag (field number and wire type). | ||
| * | ||
| * Equivalent to `uint32( (fieldNo << 3 | type) >>> 0 )`. | ||
| * | ||
| * Generated code should compute the tag ahead of time and call `uint32()`. | ||
| */ |
There was a problem hiding this comment.
Please restore the doc comment.
| /** | ||
| * Write a `int32` value, a signed 32 bit varint. | ||
| */ |
There was a problem hiding this comment.
Please restore the doc comment.
| /** | ||
| * Write a `float` value, 32-bit floating point number. | ||
| */ |
There was a problem hiding this comment.
Please restore the doc comment.
| const tmp: number[] = []; | ||
| varint32write(value, tmp); | ||
| this.raw(Uint8Array.from(tmp)); |
There was a problem hiding this comment.
Not now - we have enough moving parts - but this is worth a closer look later:
Instead of creating an Array and a Uint8Array, we can allocate the max varint size (5 bytes for uint, 10 bytes for int), and encode directly into the buffer. varint32write is not exported from the package and we are free to change the signature.
| const out = this.buffer.subarray(0, this.pos); | ||
| // Return a copy to avoid mutation if writer is reused | ||
| const result = new Uint8Array(out); |
There was a problem hiding this comment.
| const out = this.buffer.subarray(0, this.pos); | |
| // Return a copy to avoid mutation if writer is reused | |
| const result = new Uint8Array(out); | |
| const result = this.buffer.slice(0, this.pos); |
See https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/TypedArray/slice
| this.ensureCapacity(4); | ||
| new DataView( | ||
| this.buffer.buffer, | ||
| this.buffer.byteOffset, | ||
| this.buffer.byteLength, | ||
| ).setInt32(this.pos, value, true); | ||
| this.pos += 4; |
There was a problem hiding this comment.
Nice. Can you apply the same to sfixed64 and fixed64?
This refactor moves away from “chunks + push‐to‐array + concat at the end” toward a single, growable
Uint8Arraybuffer with explicit capacity management and in‐place writes. The main benefits are:• Amortized O(1) writes → by doubling the buffer when it’s full (
ensureCapacity), you avoidfrequent small allocations or large concat operations.
• Lower GC pressure → you no longer build many tiny
Uint8Arrayslices or intermediate JS arrays.• Faster varint encoding → the hot path for single‐byte values now early‐returns, and the multi‑byte loop
writes directly into the buffer instead of an intermediate array.
• Simpler fork/join → length‑delimited framing is done by shifting bytes in place rather than flushing/collecting chunks.
• More predictable memory layout → everything lives contiguously in one buffer, so slice/subarray calls are just views.
Together these yield better throughput, reduced pauses for garbage collection, and (often) smaller peak working sets at runtime.
its like #964 but less api changes