Skip to content

Optimize decode finalize to skip the full per-field scan#93

Merged
ghostdogpr merged 1 commit into
mainfrom
optimize-decode-finalize
Jun 22, 2026
Merged

Optimize decode finalize to skip the full per-field scan#93
ghostdogpr merged 1 commit into
mainfrom
optimize-decode-finalize

Conversation

@ghostdogpr

Copy link
Copy Markdown
Owner

Summary

After decoding a message, finalize() ran a full O(nFields) second pass over every message to (a) set defaults for absent fields and (b) convert repeated/map builders to their final collections. perfasm profiling of SimpleBenchmark showed this as the third-hottest method (~10% of encode+decode samples) — a non-obvious cost, since most fields are typically present and need no finalize work.

Change

  • Bitmask path (≤64 fields): walk only the missing bits — missing = fullMask & ~visitedBits, iterated via Long.numberOfTrailingZeros — so defaults are set only for genuinely absent fields. Builders are converted only for a precomputed builderFieldIndices list. When all fields are present (the common case), the default loop runs zero iterations.
  • >64-field path: unchanged linear scan (builderFieldIndices is not built for it).
  • Per-field default and builder-conversion logic extracted into setDefaultField / convertBuilderField. Behavior is identical to the previous scan.

Results

Measured with JMH (SimpleBenchmark.proteus, encode→decode round-trip), 2 forks × 6 iterations:

throughput
before 2.457 ± 0.067 ops/us
after 2.587 ± 0.034 ops/us
  • ~5% faster round-trip (decode-side gain is larger; encode is untouched), with non-overlapping error bars.
  • finalize no longer appears in the perfasm hot-method profile.
  • Allocation unchanged: gc.alloc.rate.norm = 704 B/op.

All 137 core tests pass.

After decoding a message, finalize() ran a full O(nFields) second pass on
every message to set defaults for absent fields and convert repeated/map
builders. perfasm showed this accounting for ~10% of encode+decode samples
(third-hottest method) on SimpleBenchmark.

Replace the scan with a missing-bits walk on the bitmask path: defaults are
set only for absent fields (fullMask & ~visitedBits, via numberOfTrailingZeros)
and builders are converted only for the precomputed builderFieldIndices. The
>64-field path keeps the linear scan. Behavior is unchanged; the per-field
default and builder-conversion logic is factored into setDefaultField and
convertBuilderField.

finalize disappears from the perfasm profile and round-trip throughput on
SimpleBenchmark improves ~5% (2.457 -> 2.587 ops/us, non-overlapping error
bars); allocation is unchanged at 704 B/op.
@ghostdogpr ghostdogpr merged commit 856e488 into main Jun 22, 2026
3 checks passed
@ghostdogpr ghostdogpr deleted the optimize-decode-finalize branch June 22, 2026 01:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant