Skip to content

Add Binary::make_subbinary_term for zero-overhead sub-binary Term creation#718

Closed
jeffhuen wants to merge 2 commits intorusterlium:masterfrom
jeffhuen:add-make-subbinary-term
Closed

Add Binary::make_subbinary_term for zero-overhead sub-binary Term creation#718
jeffhuen wants to merge 2 commits intorusterlium:masterfrom
jeffhuen:add-make-subbinary-term

Conversation

@jeffhuen
Copy link

@jeffhuen jeffhuen commented Feb 2, 2026

Summary

  • Adds Binary::make_subbinary_term(offset, length) -> NifResult<Term<'a>> — a safe, bounds-checked method that returns a Term directly instead of constructing an intermediate Binary struct
  • Same semantics as make_subbinary, but avoids the buf.add(offset) pointer arithmetic and struct construction when only the term representation is needed

Motivation

In hot paths that create many sub-binaries (e.g. CSV parsers using zero-copy sub-binary references), the existing make_subbinary forces callers to either:

  1. Accept the overhead of constructing a Binary struct they immediately discard (via .to_term()), or
  2. Call enif_make_sub_binary directly via unsafe code

Benchmarking on Apple M1 Pro shows the Binary struct construction adds ~15% overhead per call (1.3 ns/call), which is significant in tight loops creating hundreds of thousands of sub-binaries.

make_subbinary_term gives callers a safe API with no performance penalty over the raw FFI call.

Test plan

  • Added subbinary_as_term NIF exercising the new method
  • Tests for correct slicing (start, middle, empty, full binary)
  • Tests for out-of-bounds rejection (overflow, past-end offset)
  • Equivalence test verifying identical output to existing make_subbinary
  • All 14 binary tests pass (11 existing + 3 new)

🤖 Generated with Claude Code

…ation

The existing make_subbinary returns a Binary struct, which requires
constructing buf/size fields even when only the Term is needed. In
hot paths that create many sub-binaries (e.g. CSV parsers using
zero-copy sub-binary references), this forces callers to either accept
the overhead or call enif_make_sub_binary directly via unsafe code.

make_subbinary_term performs the same bounds check as make_subbinary
but returns Term<'a> directly, avoiding the intermediate Binary struct
construction. This gives callers a safe API with no performance penalty.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@filmor
Copy link
Member

filmor commented Feb 2, 2026

Can you provide your benchmark code? I find it very hard to believe that constructing a struct on the stack from existing values actually has a 15% impact.

Benchee benchmark creating 1M sub-binaries in a tight loop (no Vec/GC
overhead). Results on Apple M1 Pro show make_subbinary_term is 1.18x
faster by avoiding intermediate Binary struct construction.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@jeffhuen
Copy link
Author

jeffhuen commented Feb 2, 2026

You're right to ask — here's an isolated benchmark. Two NIFs looping 1M iterations internally in Rust, returning a single term (no Vec allocation or GC noise):

Name                                        ips        average  deviation         median         99th %
make_subbinary_term (Term direct)        108.65        9.20 ms     ±9.34%        8.99 ms       10.99 ms
make_subbinary (Binary struct)            92.06       10.86 ms     ±8.90%       10.72 ms       12.52 ms

Comparison:
make_subbinary_term (Term direct)        108.65
make_subbinary (Binary struct)            92.06 - 1.18x slower +1.66 ms

Benchmark code | NIF source

The measurable difference is there, but the stronger motivation is safety and API completeness. Currently, callers who only need a Term (not a Binary struct) have two options: accept the overhead of constructing and discarding a Binary, or call enif_make_sub_binary directly via unsafe code with manual bounds checking. make_subbinary_term fills that gap with a safe, bounds-checked API.

@filmor
Copy link
Member

filmor commented Feb 2, 2026

Please write your messages yourself, I do not want to talk to a machine. Translation is perfectly fine, but I don't want to see prose about how this is about "safety and API completeness".

Running the benchmark on my machine gives massively varying results per run, in particular if I adjust the order in which the benchmarks are run. Sometimes the direct term variant is "faster", sometimes the existing function is.

I looked at the assembly code of the benchmark functions. They are very similar, once make_binary_unchecked is marked as inline (the fact that it wasn't before is an oversight). The only difference is that to_term has a code path (not relevant here) to check whether the given object needs to be "moved" to the Env.

Term is also missing a From<Binary<'a>> implementation that skips the move check.

I will create an MR to add these two changes, but I would not like to merge this approach as it is the complete opposite direction I would like to go with the library (more types, not more Term) without a provable performance advantage. These are benchmark results on my machine:

Name IPS Average Deviation Median 99th % Comparison
make_subbinary_unconverted (Binary struct) 82.91 12.06 ms ±4.64% 11.92 ms 14.64 ms Baseline
make_subbinary (Binary struct) 81.03 12.34 ms ±5.45% 12.19 ms 14.85 ms 1.02x slower (+0.28 ms)
make_subbinary_into (Term direct conv) 80.20 12.47 ms ±8.30% 12.16 ms 17.13 ms 1.03x slower (+0.41 ms)
make_subbinary_term (Term direct) 79.80 12.53 ms ±9.90% 12.09 ms 17.24 ms 1.04x slower (+0.47 ms)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants