Skip to content

Conversation

kzemek
Copy link
Contributor

@kzemek kzemek commented Jul 23, 2025

io_lib:format/2 has very significant overhead over binary:encode_hex/2. This has shown up prominently in our profiling of small spanned functions (think a happy-path Cache.get()).

Benchmark results

# io_lib:format/2
Name                ips        average  deviation         median         99th %
test_span      159.15 K        6.28 μs   ±321.29%        4.40 μs       52.21 μs

# binary:to_hex/2
Name                ips        average  deviation         median         99th %
test_span      374.44 K        2.67 μs  ±1132.93%        1.70 μs       17.35 μs

Benchee code

require OpenTelemetry.Tracer

defmodule TestSpan do
  def test(_) do
    OpenTelemetry.Tracer.with_span "test" do
      :ok
    end
  end
end

Benchee.run(%{test_span: &TestSpan.test/1},
  time: 10,
  before_scenario: fn _ ->
    "scenario"
    |> OpenTelemetry.Tracer.start_span()
    |> OpenTelemetry.Tracer.set_current_span()
  end
)

Flamegraph overview

image

@kzemek kzemek requested a review from a team as a code owner July 23, 2025 10:30
Copy link

linux-foundation-easycla bot commented Jul 23, 2025

CLA Signed

The committers listed above are authorized under a signed CLA.

@kzemek
Copy link
Contributor Author

kzemek commented Jul 23, 2025

Given how dominating binary:encode_hex/2 still is in otel_ctx:attach/1 and otel_ctx:detach/1, it might make sense to go a step further and precompute the encodings into #span_ctx.

# precomputed
Name                ips        average  deviation         median         99th %
test_span      519.81 K        1.92 μs  ±1355.00%        1.34 μs       12.64 μs
image

@tsloughter
Copy link
Member

Thanks! Though I feel like there is a reason we didn't use this already... I think OTP version support.

Ah yea, encode_hex/2 is only supported since 26. Can you wrap that in a ifdef to check for what version of OTP it is compiled with.

Copy link

codecov bot commented Jul 23, 2025

Codecov Report

❌ Patch coverage is 66.66667% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 16.90%. Comparing base (1f5e352) to head (7fb96fb).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
apps/opentelemetry_api/src/otel_utils.erl 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #874      +/-   ##
==========================================
- Coverage   17.53%   16.90%   -0.64%     
==========================================
  Files          24       24              
  Lines         713      710       -3     
==========================================
- Hits          125      120       -5     
- Misses        588      590       +2     
Flag Coverage Δ
api 16.90% <66.66%> (-0.64%) ⬇️
elixir 16.90% <66.66%> (-0.64%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@tsloughter
Copy link
Member

I'd be curious if this also saves you any time: #590

@tsloughter
Copy link
Member

Agree on precomputing. It'd also be nice if attach didn't always add those encodings to the log context but now that it does I'm not sure we can remove it without a major version bump... :(

@kzemek
Copy link
Contributor Author

kzemek commented Jul 23, 2025

Thanks! Though I feel like there is a reason we didn't use this already... I think OTP version support.

Ah yea, encode_hex/2 is only supported since 26. Can you wrap that in a ifdef to check for what version of OTP it is compiled with.

I tried a more involved version with:

Zeroes = <<"00000000000000000000000000000000">>,
Tail = string:lowercase(integer_to_binary(TraceId, 16)),
ZeroesLen = 32 - byte_size(Tail),
<<Zeroes::ZeroesLen/binary, Tail/binary>>.

...and got ~250k ips. Much better than io_lib but much worse than binary:encode_hex/2. ifdefs might be the way to go if you're OK with that.

I'd be curious if this also saves you any time: #590

Yes! that's something I've been thinking of, but even slightly farther - make the hex-encoded representation the canonical one :D I applied your patch over the release version; it benchmarks slightly faster than just binary_to_hex:

Name                ips        average  deviation         median         99th %
test_span      404.90 K        2.47 μs  ±2114.72%        1.59 μs       15.84 μs
Flamegraph screenshot image

Agree on precomputing.

The way I did it adds two fields to #span_ctx — is this something we can do without a major release?

@tsloughter
Copy link
Member

Yes, I think we can add to span_ctx without a major version bump. Maybe theoretically we can't but I don't think its stopped us before :), and those should only be used through functions on a local node, not passed between nodes or accessed with element(N, SpanCtx).

@kzemek kzemek changed the title Use binary:encode_hex/2 instead of io_lib:format/2 Precompute hex encodings of SpanID and TraceID Jul 25, 2025
@kzemek
Copy link
Contributor Author

kzemek commented Jul 25, 2025

Added precomputing code. Here's a quick benchmark between >=OTP26, >=OTP24, <OTP24 encoding routines. This is still for the whole with_span

# >=OTP26
Name                ips        average  deviation         median         99th %
test_span      496.58 K        2.01 μs  ±1867.46%        1.35 μs       13.15 μs

# >=OTP24
Name                ips        average  deviation         median         99th %
test_span      420.95 K        2.38 μs  ±1526.01%        1.63 μs       16.49 μs

# <OTP24
Name                ips        average  deviation         median         99th %
test_span      369.14 K        2.71 μs  ±1639.21%        1.93 μs       17.33 μs

@bryannaegele
Copy link
Contributor

I'd be curious if this also saves you any time: #590

This is still really needed for logs. It's a nightmare dealing with it encoded.

@kzemek
Copy link
Contributor Author

kzemek commented Jul 31, 2025

Can we merge this as-is while better (and more involved) future approaches are considered?

@kzemek
Copy link
Contributor Author

kzemek commented Jul 31, 2025

Thanks for the run, will get the :undefined oversight fixed today

@kzemek
Copy link
Contributor Author

kzemek commented Jul 31, 2025

Done.

There's an ordering problem with otel_span_ets tests. Since opentelemetry depends on opentelemetry_api ~> 1.4, the tests pull the current version which doesn't define the new fields (yet).

src/otel_span_ets.erl:105:15: field hex_trace_id undefined in record span_ctx
src/otel_span_ets.erl:107:15: field hex_span_id undefined in record span_ctx

@tsloughter
Copy link
Member

@kzemek sorry about dropping the ball on this. Can you rebase so we can merge?

@kzemek
Copy link
Contributor Author

kzemek commented Sep 21, 2025

@tsloughter Rebased ☺️

ollien added a commit to discord/opentelemetry-erlang that referenced this pull request Sep 30, 2025
Squashed commit of the following:

commit 7fb96fb
Author: Konrad Zemek <[email protected]>
Date:   Thu Jul 31 19:51:04 2025 +0200

    Prefill hex IDs in remaining span_ctx constructors

commit ca05789
Author: Konrad Zemek <[email protected]>
Date:   Fri Jul 25 12:05:50 2025 +0200

    Precompute hex trace and span ids

commit 0564be2
Author: Konrad Zemek <[email protected]>
Date:   Wed Jul 23 11:36:23 2025 +0200

    Use `binary:encode_hex/2` instead of `io_lib:format/2`

    `io_lib:format/2` has very significant overhead over
    `binary:encode_hex/2`.
@tsloughter tsloughter merged commit 4e60b64 into open-telemetry:main Oct 5, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants