Skip to content

Conversation

@andrewlock
Copy link
Member

@andrewlock andrewlock commented Jan 8, 2026

Summary of changes

Update Activity-based benchmarks to reduce variability and make comparisons easier

Reason for change

The various Activity-related benchmarks call the global Tracer instance, so we should make sure to configure it with our default benchmarking settings (basically disabling background jobs like telemetry/discovery/remote config) to reduce variation. Also added a "baseline" job for comparison and a version that uses hiearachical IDs instead of W3C IDs. Was a prerequisite for a bunch of other work.

Implementation details

  • Configure Benchmarks.OpenTelemetry.InstrumentedApi project and Benchmarks.Trace/ActivityBenchmark to setup the global tracer with Telemetry etc disabled
  • Add two extra benchmarks to Benchmarks.Trace/ActivityBenchmark
    • StartStopWithChild_Baseline, which is the same as StartStopWithChild but without the DD integration
    • StartStopWithChild_Hierarchical, which is the same as StartStopWithChild but uses hierarchical ID format
    • Don't run either of these in CI for now (to avoid extra load), just for local comparisons
  • Simplify the benchmark to just do explicit duck typing (which is closer to what we do normally anyway, and removes a bunch of code)

Test coverage

Running these benchmarks locally shows the improvements we need to make, and highlights that we clearly have a bug with hierarchical IDs 😬

Method Runtime Mean Error StdDev Gen0 Gen1 Allocated
StartStopWithChild_Baseline .NET 6.0 672.9 ns 23.23 ns 66.66 ns 0.0038 - 1.09 KB
StartStopWithChild_Hierarchical .NET 6.0 35,478.1 ns 1,035.96 ns 3,021.94 ns - - 10.47 KB
StartStopWithChild .NET 6.0 3,681.7 ns 50.23 ns 44.53 ns 0.0153 - 4.87 KB
StartStopWithChild_Baseline .NET 8.0 570.8 ns 16.96 ns 49.22 ns 0.0029 - 1.09 KB
StartStopWithChild_Hierarchical .NET 8.0 32,478.5 ns 1,215.04 ns 3,466.57 ns - - 10.48 KB
StartStopWithChild .NET 8.0 3,021.4 ns 59.24 ns 163.16 ns 0.0153 - 4.77 KB
StartStopWithChild_Baseline .NET Core 3.1 752.6 ns 15.06 ns 30.09 ns 0.0038 - 1.19 KB
StartStopWithChild_Hierarchical .NET Core 3.1 34,170.1 ns 572.80 ns 478.31 ns - - 10.75 KB
StartStopWithChild .NET Core 3.1 4,905.0 ns 31.63 ns 29.58 ns 0.0153 - 5.05 KB
StartStopWithChild_Baseline .NET Framework 4.7.2 770.7 ns 6.03 ns 5.64 ns 0.2012 - 1.24 KB
StartStopWithChild_Hierarchical .NET Framework 4.7.2 37,386.2 ns 542.73 ns 453.21 ns 1.8921 0.1221 11.77 KB
StartStopWithChild .NET Framework 4.7.2 5,884.5 ns 64.54 ns 60.37 ns 0.8621 - 5.3 KB

Other details

https://datadoghq.atlassian.net/browse/LANGPLAT-915

Part of a stack working to improve OTel performance

@andrewlock andrewlock added area:benchmarks Benchmarks, throughput tests, Crank, Bombardier, etc area:opentelemetry OpenTelemetry support labels Jan 8, 2026
@andrewlock andrewlock changed the title Update Activity benchmarks for stability Update Activity benchmarks for stability Jan 8, 2026
@github-actions github-actions bot added the area:tests unit tests, integration tests label Jan 8, 2026
@dd-trace-dotnet-ci-bot
Copy link

Execution-Time Benchmarks Report ⏱️

Execution-time results for samples comparing This PR (8036) and master.

✅ No regressions detected - check the details below

Full Metrics Comparison

FakeDbCommand

Metric Master (Mean ± 95% CI) Current (Mean ± 95% CI) Change Status
.NET Framework 4.8 - Baseline
duration68.02 ± (68.08 - 68.28) ms68.26 ± (68.21 - 68.42) ms+0.3%✅⬆️
.NET Framework 4.8 - Bailout
duration71.95 ± (71.88 - 72.08) ms72.20 ± (72.18 - 72.41) ms+0.4%✅⬆️
.NET Framework 4.8 - CallTarget+Inlining+NGEN
duration1003.26 ± (1004.17 - 1009.93) ms1005.93 ± (1011.87 - 1020.44) ms+0.3%✅⬆️
.NET Core 3.1 - Baseline
process.internal_duration_ms22.00 ± (21.96 - 22.04) ms21.93 ± (21.89 - 21.97) ms-0.3%
process.time_to_main_ms78.68 ± (78.54 - 78.82) ms78.83 ± (78.67 - 78.99) ms+0.2%✅⬆️
runtime.dotnet.exceptions.count0 ± (0 - 0)0 ± (0 - 0)+0.0%
runtime.dotnet.mem.committed10.91 ± (10.90 - 10.91) MB10.90 ± (10.90 - 10.90) MB-0.0%
runtime.dotnet.threads.count12 ± (12 - 12)12 ± (12 - 12)+0.0%
.NET Core 3.1 - Bailout
process.internal_duration_ms21.88 ± (21.86 - 21.90) ms21.81 ± (21.79 - 21.83) ms-0.3%
process.time_to_main_ms79.66 ± (79.56 - 79.76) ms79.96 ± (79.86 - 80.05) ms+0.4%✅⬆️
runtime.dotnet.exceptions.count0 ± (0 - 0)0 ± (0 - 0)+0.0%
runtime.dotnet.mem.committed10.95 ± (10.95 - 10.96) MB10.94 ± (10.94 - 10.95) MB-0.1%
runtime.dotnet.threads.count13 ± (13 - 13)13 ± (13 - 13)+0.0%
.NET Core 3.1 - CallTarget+Inlining+NGEN
process.internal_duration_ms228.37 ± (224.62 - 232.13) ms239.43 ± (235.30 - 243.57) ms+4.8%✅⬆️
process.time_to_main_ms469.63 ± (469.13 - 470.13) ms471.55 ± (471.02 - 472.09) ms+0.4%✅⬆️
runtime.dotnet.exceptions.count0 ± (0 - 0)0 ± (0 - 0)+0.0%
runtime.dotnet.mem.committed48.24 ± (48.21 - 48.26) MB48.24 ± (48.22 - 48.26) MB+0.0%✅⬆️
runtime.dotnet.threads.count28 ± (28 - 28)28 ± (28 - 28)+0.7%✅⬆️
.NET 6 - Baseline
process.internal_duration_ms20.51 ± (20.48 - 20.53) ms20.72 ± (20.69 - 20.75) ms+1.0%✅⬆️
process.time_to_main_ms67.85 ± (67.72 - 67.97) ms68.25 ± (68.14 - 68.37) ms+0.6%✅⬆️
runtime.dotnet.exceptions.count0 ± (0 - 0)0 ± (0 - 0)+0.0%
runtime.dotnet.mem.committed10.62 ± (10.61 - 10.62) MB10.64 ± (10.64 - 10.65) MB+0.3%✅⬆️
runtime.dotnet.threads.count10 ± (10 - 10)10 ± (10 - 10)+0.0%
.NET 6 - Bailout
process.internal_duration_ms20.53 ± (20.50 - 20.55) ms20.71 ± (20.69 - 20.74) ms+0.9%✅⬆️
process.time_to_main_ms68.88 ± (68.82 - 68.94) ms69.08 ± (69.03 - 69.14) ms+0.3%✅⬆️
runtime.dotnet.exceptions.count0 ± (0 - 0)0 ± (0 - 0)+0.0%
runtime.dotnet.mem.committed10.71 ± (10.69 - 10.72) MB10.75 ± (10.74 - 10.75) MB+0.4%✅⬆️
runtime.dotnet.threads.count11 ± (11 - 11)11 ± (11 - 11)+0.0%
.NET 6 - CallTarget+Inlining+NGEN
process.internal_duration_ms241.40 ± (238.73 - 244.06) ms244.57 ± (242.16 - 246.98) ms+1.3%✅⬆️
process.time_to_main_ms438.63 ± (438.17 - 439.09) ms442.78 ± (442.32 - 443.24) ms+0.9%✅⬆️
runtime.dotnet.exceptions.count0 ± (0 - 0)0 ± (0 - 0)+0.0%
runtime.dotnet.mem.committed48.67 ± (48.64 - 48.70) MB48.69 ± (48.66 - 48.72) MB+0.0%✅⬆️
runtime.dotnet.threads.count28 ± (28 - 28)28 ± (28 - 28)-0.5%
.NET 8 - Baseline
process.internal_duration_ms18.80 ± (18.77 - 18.83) ms18.79 ± (18.75 - 18.83) ms-0.0%
process.time_to_main_ms67.01 ± (66.91 - 67.11) ms67.10 ± (67.00 - 67.20) ms+0.1%✅⬆️
runtime.dotnet.exceptions.count0 ± (0 - 0)0 ± (0 - 0)+0.0%
runtime.dotnet.mem.committed7.66 ± (7.65 - 7.67) MB7.69 ± (7.68 - 7.70) MB+0.4%✅⬆️
runtime.dotnet.threads.count10 ± (10 - 10)10 ± (10 - 10)+0.0%
.NET 8 - Bailout
process.internal_duration_ms18.83 ± (18.79 - 18.86) ms18.89 ± (18.86 - 18.91) ms+0.3%✅⬆️
process.time_to_main_ms68.09 ± (68.03 - 68.16) ms68.32 ± (68.26 - 68.38) ms+0.3%✅⬆️
runtime.dotnet.exceptions.count0 ± (0 - 0)0 ± (0 - 0)+0.0%
runtime.dotnet.mem.committed7.72 ± (7.71 - 7.73) MB7.73 ± (7.72 - 7.74) MB+0.1%✅⬆️
runtime.dotnet.threads.count11 ± (11 - 11)11 ± (11 - 11)+0.0%
.NET 8 - CallTarget+Inlining+NGEN
process.internal_duration_ms178.05 ± (177.20 - 178.91) ms181.24 ± (180.06 - 182.43) ms+1.8%✅⬆️
process.time_to_main_ms424.87 ± (424.22 - 425.51) ms425.81 ± (425.19 - 426.44) ms+0.2%✅⬆️
runtime.dotnet.exceptions.count0 ± (0 - 0)0 ± (0 - 0)+0.0%
runtime.dotnet.mem.committed36.34 ± (36.31 - 36.37) MB36.33 ± (36.30 - 36.36) MB-0.0%
runtime.dotnet.threads.count27 ± (27 - 27)27 ± (27 - 27)-0.1%

HttpMessageHandler

Metric Master (Mean ± 95% CI) Current (Mean ± 95% CI) Change Status
.NET Framework 4.8 - Baseline
duration192.32 ± (192.27 - 193.08) ms193.43 ± (193.48 - 194.31) ms+0.6%✅⬆️
.NET Framework 4.8 - Bailout
duration196.25 ± (196.03 - 196.44) ms196.44 ± (196.37 - 197.02) ms+0.1%✅⬆️
.NET Framework 4.8 - CallTarget+Inlining+NGEN
duration1107.64 ± (1110.58 - 1117.53) ms1112.84 ± (1114.90 - 1122.76) ms+0.5%✅⬆️
.NET Core 3.1 - Baseline
process.internal_duration_ms187.79 ± (187.41 - 188.17) ms188.77 ± (188.36 - 189.19) ms+0.5%✅⬆️
process.time_to_main_ms80.26 ± (80.07 - 80.45) ms80.57 ± (80.33 - 80.81) ms+0.4%✅⬆️
runtime.dotnet.exceptions.count3 ± (3 - 3)3 ± (3 - 3)+0.0%
runtime.dotnet.mem.committed16.05 ± (16.03 - 16.07) MB16.06 ± (16.04 - 16.09) MB+0.1%✅⬆️
runtime.dotnet.threads.count20 ± (19 - 20)20 ± (20 - 20)+0.2%✅⬆️
.NET Core 3.1 - Bailout
process.internal_duration_ms186.91 ± (186.60 - 187.21) ms188.83 ± (188.37 - 189.29) ms+1.0%✅⬆️
process.time_to_main_ms81.40 ± (81.30 - 81.51) ms82.16 ± (81.97 - 82.35) ms+0.9%✅⬆️
runtime.dotnet.exceptions.count3 ± (3 - 3)3 ± (3 - 3)+0.0%
runtime.dotnet.mem.committed16.23 ± (16.20 - 16.26) MB16.17 ± (16.15 - 16.20) MB-0.3%
runtime.dotnet.threads.count21 ± (21 - 21)21 ± (20 - 21)-0.2%
.NET Core 3.1 - CallTarget+Inlining+NGEN
process.internal_duration_ms418.30 ± (414.95 - 421.66) ms417.22 ± (413.73 - 420.72) ms-0.3%
process.time_to_main_ms472.39 ± (471.82 - 472.95) ms474.89 ± (474.21 - 475.58) ms+0.5%✅⬆️
runtime.dotnet.exceptions.count3 ± (3 - 3)3 ± (3 - 3)+0.0%
runtime.dotnet.mem.committed58.72 ± (58.61 - 58.83) MB58.72 ± (58.60 - 58.85) MB+0.0%✅⬆️
runtime.dotnet.threads.count29 ± (29 - 30)30 ± (29 - 30)+0.1%✅⬆️
.NET 6 - Baseline
process.internal_duration_ms193.12 ± (192.67 - 193.57) ms192.60 ± (192.22 - 192.99) ms-0.3%
process.time_to_main_ms69.81 ± (69.63 - 70.00) ms69.64 ± (69.48 - 69.80) ms-0.2%
runtime.dotnet.exceptions.count4 ± (4 - 4)4 ± (4 - 4)+0.0%
runtime.dotnet.mem.committed16.32 ± (16.25 - 16.38) MB16.30 ± (16.21 - 16.40) MB-0.1%
runtime.dotnet.threads.count19 ± (19 - 19)19 ± (19 - 19)-0.2%
.NET 6 - Bailout
process.internal_duration_ms191.32 ± (191.03 - 191.61) ms193.87 ± (193.33 - 194.41) ms+1.3%✅⬆️
process.time_to_main_ms70.49 ± (70.39 - 70.58) ms71.33 ± (71.20 - 71.46) ms+1.2%✅⬆️
runtime.dotnet.exceptions.count4 ± (4 - 4)4 ± (4 - 4)+0.0%
runtime.dotnet.mem.committed16.16 ± (16.03 - 16.30) MB16.45 ± (16.43 - 16.48) MB+1.8%✅⬆️
runtime.dotnet.threads.count20 ± (19 - 20)20 ± (20 - 20)+1.9%✅⬆️
.NET 6 - CallTarget+Inlining+NGEN
process.internal_duration_ms456.08 ± (453.84 - 458.32) ms454.34 ± (451.95 - 456.74) ms-0.4%
process.time_to_main_ms444.85 ± (444.33 - 445.37) ms447.44 ± (446.92 - 447.97) ms+0.6%✅⬆️
runtime.dotnet.exceptions.count4 ± (4 - 4)4 ± (4 - 4)+0.0%
runtime.dotnet.mem.committed58.23 ± (58.11 - 58.35) MB58.14 ± (58.02 - 58.25) MB-0.2%
runtime.dotnet.threads.count29 ± (29 - 30)30 ± (29 - 30)+0.1%✅⬆️
.NET 8 - Baseline
process.internal_duration_ms190.26 ± (189.94 - 190.58) ms191.42 ± (190.98 - 191.86) ms+0.6%✅⬆️
process.time_to_main_ms69.38 ± (69.17 - 69.59) ms69.52 ± (69.28 - 69.76) ms+0.2%✅⬆️
runtime.dotnet.exceptions.count4 ± (4 - 4)4 ± (4 - 4)+0.0%
runtime.dotnet.mem.committed11.75 ± (11.73 - 11.78) MB11.68 ± (11.65 - 11.70) MB-0.6%
runtime.dotnet.threads.count18 ± (18 - 18)18 ± (18 - 18)-0.1%
.NET 8 - Bailout
process.internal_duration_ms189.73 ± (189.42 - 190.04) ms190.98 ± (190.48 - 191.48) ms+0.7%✅⬆️
process.time_to_main_ms70.01 ± (69.91 - 70.12) ms70.69 ± (70.51 - 70.86) ms+1.0%✅⬆️
runtime.dotnet.exceptions.count4 ± (4 - 4)4 ± (4 - 4)+0.0%
runtime.dotnet.mem.committed11.84 ± (11.81 - 11.87) MB11.76 ± (11.73 - 11.79) MB-0.7%
runtime.dotnet.threads.count19 ± (19 - 19)19 ± (19 - 19)+0.3%✅⬆️
.NET 8 - CallTarget+Inlining+NGEN
process.internal_duration_ms362.07 ± (360.71 - 363.44) ms363.73 ± (362.01 - 365.46) ms+0.5%✅⬆️
process.time_to_main_ms427.99 ± (427.31 - 428.67) ms429.77 ± (429.08 - 430.45) ms+0.4%✅⬆️
runtime.dotnet.exceptions.count4 ± (4 - 4)4 ± (4 - 4)+0.0%
runtime.dotnet.mem.committed47.98 ± (47.94 - 48.02) MB47.98 ± (47.94 - 48.01) MB-0.0%
runtime.dotnet.threads.count29 ± (29 - 29)29 ± (29 - 29)-0.1%
Comparison explanation

Execution-time benchmarks measure the whole time it takes to execute a program, and are intended to measure the one-off costs. Cases where the execution time results for the PR are worse than latest master results are highlighted in **red**. The following thresholds were used for comparing the execution times:

  • Welch test with statistical test for significance of 5%
  • Only results indicating a difference greater than 5% and 5 ms are considered.

Note that these results are based on a single point-in-time result for each branch. For full results, see the dashboard.

Graphs show the p99 interval based on the mean and StdDev of the test run, as well as the mean value of the run (shown as a diamond below the graph).

Duration charts
FakeDbCommand (.NET Framework 4.8)
gantt
    title Execution time (ms) FakeDbCommand (.NET Framework 4.8)
    dateFormat  x
    axisFormat %Q
    todayMarker off
    section Baseline
    This PR (8036) - mean (68ms)  : 67, 70
    master - mean (68ms)  : 67, 70

    section Bailout
    This PR (8036) - mean (72ms)  : 71, 73
    master - mean (72ms)  : 71, 73

    section CallTarget+Inlining+NGEN
    This PR (8036) - mean (1,016ms)  : 953, 1080
    master - mean (1,007ms)  : 966, 1048

Loading
FakeDbCommand (.NET Core 3.1)
gantt
    title Execution time (ms) FakeDbCommand (.NET Core 3.1)
    dateFormat  x
    axisFormat %Q
    todayMarker off
    section Baseline
    This PR (8036) - mean (106ms)  : 103, 109
    master - mean (106ms)  : 103, 109

    section Bailout
    This PR (8036) - mean (107ms)  : 106, 108
    master - mean (107ms)  : 105, 108

    section CallTarget+Inlining+NGEN
    This PR (8036) - mean (737ms)  : 674, 800
    master - mean (723ms)  : 668, 779

Loading
FakeDbCommand (.NET 6)
gantt
    title Execution time (ms) FakeDbCommand (.NET 6)
    dateFormat  x
    axisFormat %Q
    todayMarker off
    section Baseline
    This PR (8036) - mean (94ms)  : 91, 96
    master - mean (93ms)  : 91, 95

    section Bailout
    This PR (8036) - mean (94ms)  : 94, 95
    master - mean (94ms)  : 93, 95

    section CallTarget+Inlining+NGEN
    This PR (8036) - mean (712ms)  : 676, 747
    master - mean (703ms)  : 657, 749

Loading
FakeDbCommand (.NET 8)
gantt
    title Execution time (ms) FakeDbCommand (.NET 8)
    dateFormat  x
    axisFormat %Q
    todayMarker off
    section Baseline
    This PR (8036) - mean (92ms)  : 89, 95
    master - mean (92ms)  : 89, 94

    section Bailout
    This PR (8036) - mean (93ms)  : 92, 94
    master - mean (93ms)  : 92, 94

    section CallTarget+Inlining+NGEN
    This PR (8036) - mean (636ms)  : 619, 653
    master - mean (632ms)  : 616, 647

Loading
HttpMessageHandler (.NET Framework 4.8)
gantt
    title Execution time (ms) HttpMessageHandler (.NET Framework 4.8)
    dateFormat  x
    axisFormat %Q
    todayMarker off
    section Baseline
    This PR (8036) - mean (194ms)  : 189, 198
    master - mean (193ms)  : 189, 197

    section Bailout
    This PR (8036) - mean (197ms)  : 193, 200
    master - mean (196ms)  : 194, 198

    section CallTarget+Inlining+NGEN
    This PR (8036) - mean (1,119ms)  : 1062, 1175
    master - mean (1,114ms)  : 1064, 1164

Loading
HttpMessageHandler (.NET Core 3.1)
gantt
    title Execution time (ms) HttpMessageHandler (.NET Core 3.1)
    dateFormat  x
    axisFormat %Q
    todayMarker off
    section Baseline
    This PR (8036) - mean (278ms)  : 271, 285
    master - mean (276ms)  : 271, 281

    section Bailout
    This PR (8036) - mean (279ms)  : 273, 285
    master - mean (276ms)  : 273, 280

    section CallTarget+Inlining+NGEN
    This PR (8036) - mean (925ms)  : 878, 972
    master - mean (921ms)  : 865, 976

Loading
HttpMessageHandler (.NET 6)
gantt
    title Execution time (ms) HttpMessageHandler (.NET 6)
    dateFormat  x
    axisFormat %Q
    todayMarker off
    section Baseline
    This PR (8036) - mean (270ms)  : 266, 275
    master - mean (271ms)  : 264, 278

    section Bailout
    This PR (8036) - mean (273ms)  : 267, 280
    master - mean (270ms)  : 267, 273

    section CallTarget+Inlining+NGEN
    This PR (8036) - mean (930ms)  : 891, 969
    master - mean (930ms)  : 894, 966

Loading
HttpMessageHandler (.NET 8)
gantt
    title Execution time (ms) HttpMessageHandler (.NET 8)
    dateFormat  x
    axisFormat %Q
    todayMarker off
    section Baseline
    This PR (8036) - mean (271ms)  : 265, 277
    master - mean (269ms)  : 264, 275

    section Bailout
    This PR (8036) - mean (271ms)  : 264, 278
    master - mean (269ms)  : 265, 273

    section CallTarget+Inlining+NGEN
    This PR (8036) - mean (824ms)  : 806, 843
    master - mean (821ms)  : 802, 840

Loading

Copy link
Contributor

@zacharycmontoya zacharycmontoya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thanks for the improvements!

@pr-commenter
Copy link

pr-commenter bot commented Jan 9, 2026

Benchmarks

Benchmark execution time: 2026-01-09 10:54:56

Comparing candidate commit efb97cb in PR branch andrew/otel/benchmarks with baseline commit 43d4334 in branch master.

Found 8 performance improvements and 5 performance regressions! Performance is the same for 161 metrics, 12 unstable metrics.

scenario:Benchmarks.Trace.ActivityBenchmark.StartStopWithChild net472

  • 🟩 throughput [+21580.346op/s; +22540.952op/s] or [+48.048%; +50.187%]

scenario:Benchmarks.Trace.ActivityBenchmark.StartStopWithChild net6.0

  • 🟩 execution_time [-15.269ms; -10.959ms] or [-7.226%; -5.186%]

scenario:Benchmarks.Trace.Asm.AppSecBodyBenchmark.AllCycleSimpleBody net6.0

  • 🟩 execution_time [-34.939ms; -28.544ms] or [-15.516%; -12.677%]

scenario:Benchmarks.Trace.CharSliceBenchmark.OptimizedCharSlice net6.0

  • 🟩 execution_time [-141.135µs; -129.012µs] or [-9.447%; -8.635%]
  • 🟩 throughput [+63.694op/s; +69.307op/s] or [+9.516%; +10.354%]

scenario:Benchmarks.Trace.CharSliceBenchmark.OriginalCharSlice net472

  • 🟩 execution_time [-169.942µs; -164.685µs] or [-6.249%; -6.056%]
  • 🟩 throughput [+23.724op/s; +24.487op/s] or [+6.452%; +6.659%]

scenario:Benchmarks.Trace.CharSliceBenchmark.OriginalCharSlice net6.0

  • 🟥 execution_time [+190.263µs; +199.617µs] or [+9.831%; +10.315%]
  • 🟥 throughput [-48.463op/s; -46.117op/s] or [-9.379%; -8.925%]

scenario:Benchmarks.Trace.ElasticsearchBenchmark.CallElasticsearch net6.0

  • 🟥 execution_time [+11.012ms; +16.795ms] or [+5.553%; +8.470%]

scenario:Benchmarks.Trace.ElasticsearchBenchmark.CallElasticsearchAsync netcoreapp3.1

  • 🟩 throughput [+24103.726op/s; +32537.178op/s] or [+5.866%; +7.919%]

scenario:Benchmarks.Trace.Log4netBenchmark.EnrichedLog netcoreapp3.1

  • 🟥 execution_time [+12.197ms; +14.683ms] or [+7.614%; +9.167%]

scenario:Benchmarks.Trace.NLogBenchmark.EnrichedLog netcoreapp3.1

  • 🟥 execution_time [+86.505ms; +88.818ms] or [+79.078%; +81.192%]

@andrewlock
Copy link
Member Author

I'm not sure what's up with this:

🟩 throughput [+21580.346op/s; +22540.952op/s] or [+48.048%; +50.187%]

These are the results reported in CI:

Method Job Runtime Mean Error StdDev Ratio Gen0 Allocated Alloc Ratio
StartStopWithChild .NET 6.0 .NET 6.0 10.11 us 0.058 us 0.054 us 0.68 - 4.87 KB 0.92
StartStopWithChild .NET Core 3.1 .NET Core 3.1 14.36 us 0.100 us 0.093 us 0.96 - 5.05 KB 0.95
StartStopWithChild .NET Framework 4.7.2 .NET Framework 4.7.2 14.92 us 0.039 us 0.034 us 1.00 0.8224 5.3 KB 1.00

and they look "fine" to me, so I think we're ok 🤷‍♂️ I think the issue was that before our benchmark was heavily skewed against .NET FX (something to do with our mocks I'm guessing?) 🤔

image

@andrewlock andrewlock merged commit 72be34c into master Jan 9, 2026
151 of 153 checks passed
@andrewlock andrewlock deleted the andrew/otel/benchmarks branch January 9, 2026 11:09
@github-actions github-actions bot added this to the vNext-v3 milestone Jan 9, 2026
andrewlock added a commit that referenced this pull request Jan 9, 2026
## Summary of changes

Reduce allocation of `ActivityHandlerCommon` by removing `string`
concatenation

## Reason for change

The `ActivityHandlerCommon.ActivityStarted` and `ActivityStopped`
methods need to store and retrieve `Activity` instances from a
`ConcurrentDictionary<>`. Today they're doing that be concatenating the
`Activity`'s `TraceId` and `SpanID`, or by using it's `ID`. All that
concatenation causes a bunch of allocation, so instead introduce a
simple `struct` to use as the key instead

## Implementation details

Introduce `ActivityKey`, which is essentially `internal readonly record
struct ActivityKey(string TraceId, string SpanId`, and use that for all
of the dictionary lookups. Which avoids all the string concatenation
allocations.

## Test coverage

Added some unit tests for `ActivityKey`, by mostly covered by existing
integration tests for correctness. Benchmarks show a significant
improvement over [the previous
results](#8036),
particularly for Hierachical IDs which clearly were buggy


| Method | Runtime | Mean | StdDev | Gen0 | Allocated | Compared to
#8036 |
| ------------------------------- | -------------------- | -------: |
--------: | -----: | --------: | ----------------: |
| StartStopWithChild_Hierarchical | .NET 6.0 | 4.217 us | 0.3227 us |
0.0153 | 4.09 KB | -6.38 KB |
| StartStopWithChild_Hierarchical | .NET 8.0 | 3.413 us | 0.2505 us |
0.0076 | 4.09 KB | -6.39 KB |
| StartStopWithChild_Hierarchical | .NET Core 3.1 | 5.676 us | 0.4636 us
| 0.0153 | 4.32 KB | -6.43 KB |
| StartStopWithChild_Hierarchical | .NET Framework 4.7.2 | 6.813 us |
0.4969 us | 0.7324 | 4.53 KB | -7.24 KB |
| | | | | | | |
| StartStopWithChild | .NET 6.0 | 4.105 us | 0.2677 us | 0.0153 | 4.3 KB
| -0.57 KB |
| StartStopWithChild | .NET 8.0 | 3.475 us | 0.1570 us | 0.0114 | 4.2 KB
| -0.57 KB |
| StartStopWithChild | .NET Core 3.1 | 5.647 us | 0.3129 us | 0.0153 |
4.48 KB | -0.57 KB |
| StartStopWithChild | .NET Framework 4.7.2 | 6.842 us | 0.2992 us |
0.7629 | 4.69 KB | -0.61 KB |



## Other details

https://datadoghq.atlassian.net/browse/LANGPLAT-915

Part of a stack working to improve OTel performance

- #8036 
- #8037 👈
- #8038
- #8039
- #8040
- #8041
- #8042
andrewlock added a commit that referenced this pull request Jan 9, 2026
## Summary of changes

Fix incorrect nullable annotations on `Activity` duck types

## Reason for change

While working on other performance things, noticed that the nullable
annotations often declared non-nullability when they actually could be
null.

A particularly confusing part are the `TraceId` and `SpanId` values in
`IW3CActivity`. These were marked non-nullable because when you call
`Activity.Start()` then these will always be non-null, but _only_ if
you're using W3C IDs. If you're using hierarchical IDs ([the default in
<.NET
5](https://learn.microsoft.com/en-us/dotnet/core/compatibility/core-libraries/5.0/default-activityidformat-changed))
then these values _will_ be null.

As a side note, I suspect this explains the "we saw errors about these
being null in error tracking but don't understand why" scenarios 😄

Also, I think we should rename `IW3CActivity` to `IActivity3` instead.
W3C _implies_ that it's a W3C activity, but that's not necessarily the
case, and is essentially the source of the above confusion I think.
`IActivity3` would then be consistently named with `IActivity5` and
`IActivity6` we also currently have.

## Implementation details

Add nullable annotations to values that _can_ be null, and fix the
fallout (mostly in `ActivityKey`)

## Test coverage

Covered by existing tests sufficiently I think

## Other details

https://datadoghq.atlassian.net/browse/LANGPLAT-915

Part of a stack working to improve OTel performance

- #8036
- #8037 👈
- #8038
- #8039
- #8040
- #8041
- #8042
andrewlock added a commit that referenced this pull request Jan 9, 2026
## Summary of changes

Avoids allocating a closure in .NET Core if we can avoid it

## Reason for change

.NET Core's `ConcurrentDictionary.GetOrAdd()` method allows providing a
"state" object which we can pass to the `GetOrAdd` method. Using this
method avoids allocating a closure every time the method is hit, and we
can pass the state using a value tuple to avoid additional allocation
there

## Implementation details

`#if`/`#else` to glory 

## Test coverage

Functionality is covered by existing tests, benchmarks show an
incremental improvement over #8037 for .NET Core, as expected:


| Method | Runtime | Mean | StdDev | Gen0 | Allocated | Compared to
#8037 |
| ------------------------------- | -------------------- | -------: |
--------: | -----: | --------: | ----------------: |
| StartStopWithChild_Hierarchical | .NET 6.0 | 3.746 us | 0.1128 us |
0.0114 | 3.81 KB | -0.28 KB |
| StartStopWithChild_Hierarchical | .NET 8.0 | 2.759 us | 0.0374 us |
0.0114 | 3.81 KB | -0.28 KB |
| StartStopWithChild_Hierarchical | .NET Core 3.1 | 4.762 us | 0.0584 us
| 0.0153 | 4.04 KB | -0.28 KB |
| StartStopWithChild_Hierarchical | .NET Framework 4.7.2 | 5.651 us |
0.0717 us | 0.7324 | 4.54 KB | 0.01 KB (noise) |
| | | | | | | |
| StartStopWithChild | .NET 6.0 | 3.607 us | 0.0508 us | 0.0153 | 4.02
KB | -0.28 KB |
| StartStopWithChild | .NET 8.0 | 2.921 us | 0.0617 us | 0.0114 | 3.91
KB | -0.29 KB |
| StartStopWithChild | .NET Core 3.1 | 4.922 us | 0.0407 us | 0.0153 |
4.2 KB | -0.28 KB |
| StartStopWithChild | .NET Framework 4.7.2 | 6.008 us | 0.0979 us |
0.7629 | 4.69 KB | 0 |


## Other details

https://datadoghq.atlassian.net/browse/LANGPLAT-915

Part of a stack working to improve OTel performance

- #8036
- #8037
- #8038
- #8039 👈
- #8040
- #8041
- #8042
pablomartinezbernardo pushed a commit that referenced this pull request Jan 10, 2026
## Summary of changes

Reduce allocation of `ActivityHandlerCommon` by removing `string`
concatenation

## Reason for change

The `ActivityHandlerCommon.ActivityStarted` and `ActivityStopped`
methods need to store and retrieve `Activity` instances from a
`ConcurrentDictionary<>`. Today they're doing that be concatenating the
`Activity`'s `TraceId` and `SpanID`, or by using it's `ID`. All that
concatenation causes a bunch of allocation, so instead introduce a
simple `struct` to use as the key instead

## Implementation details

Introduce `ActivityKey`, which is essentially `internal readonly record
struct ActivityKey(string TraceId, string SpanId`, and use that for all
of the dictionary lookups. Which avoids all the string concatenation
allocations.

## Test coverage

Added some unit tests for `ActivityKey`, by mostly covered by existing
integration tests for correctness. Benchmarks show a significant
improvement over [the previous
results](#8036),
particularly for Hierachical IDs which clearly were buggy


| Method | Runtime | Mean | StdDev | Gen0 | Allocated | Compared to
#8036 |
| ------------------------------- | -------------------- | -------: |
--------: | -----: | --------: | ----------------: |
| StartStopWithChild_Hierarchical | .NET 6.0 | 4.217 us | 0.3227 us |
0.0153 | 4.09 KB | -6.38 KB |
| StartStopWithChild_Hierarchical | .NET 8.0 | 3.413 us | 0.2505 us |
0.0076 | 4.09 KB | -6.39 KB |
| StartStopWithChild_Hierarchical | .NET Core 3.1 | 5.676 us | 0.4636 us
| 0.0153 | 4.32 KB | -6.43 KB |
| StartStopWithChild_Hierarchical | .NET Framework 4.7.2 | 6.813 us |
0.4969 us | 0.7324 | 4.53 KB | -7.24 KB |
| | | | | | | |
| StartStopWithChild | .NET 6.0 | 4.105 us | 0.2677 us | 0.0153 | 4.3 KB
| -0.57 KB |
| StartStopWithChild | .NET 8.0 | 3.475 us | 0.1570 us | 0.0114 | 4.2 KB
| -0.57 KB |
| StartStopWithChild | .NET Core 3.1 | 5.647 us | 0.3129 us | 0.0153 |
4.48 KB | -0.57 KB |
| StartStopWithChild | .NET Framework 4.7.2 | 6.842 us | 0.2992 us |
0.7629 | 4.69 KB | -0.61 KB |



## Other details

https://datadoghq.atlassian.net/browse/LANGPLAT-915

Part of a stack working to improve OTel performance

- #8036 
- #8037 👈
- #8038
- #8039
- #8040
- #8041
- #8042
andrewlock added a commit that referenced this pull request Jan 12, 2026
## Summary of changes

Various optimizations in the `Activity` handling code to reduce
allocations and execution time

## Reason for change

Some properties are triksy, as they do a bunch of allocation, so we
should avoid them if we can. The changes in here _look_ bigger than they
are diff-wise, I've added comments to aid review.

## Implementation details

- Don't call `ParentId` until we definitely need it.
- This property does a bunch of allocation to generate a "valid" value,
so we should avoid it if we can. This makes the conditions a bit harder
to read, but delays calling `ParentId` until we're sure we don't have
something better already
- Avoid calling `Tracer.Instance.ActiveScope?.Span` until we know we
need it (very minor optimisations but why not 🤷‍♂️)
- Extract `StopActivitySlow` to a separate method, as we _shouldn't_ hit
this now, so should help the JIT out with things like code size etc of
the calling method (conjecture, not tested, but I think it's better from
code understanding PoV too)
- Simplify `ShouldIgnoreByOperationName` which also improves execution
time 10x from ~30us to ~3us.

## Test coverage

Functionally covered by existing tests. Benchmarks compared to #8039
show improvments. Allocations listed below, but execution time is also
improved:

I re-ran the numbers after making the updates, and the benchmarks are
actually better:

<details><summary>Benchmarks for original PR</summary>
<p>


| Method | Runtime | Mean | StdDev | Gen0 | Allocated | Compared to
#8039 |
| ------------------------------- | -------------------- | -------: |
--------: | -----: | --------: | ----------------: |
| StartStopWithChild_Hierarchical | .NET 6.0 | 3.410 us | 0.0658 us |
0.0153 | 3.79 KB | -0.02 KB |
| StartStopWithChild_Hierarchical | .NET 8.0 | 2.582 us | 0.0443 us |
0.0114 | 3.79 KB | -0.02 KB |
| StartStopWithChild_Hierarchical | .NET Core 3.1 | 4.272 us | 0.0731 us
| 0.0153 | 4.02 KB | -0.02 KB |
| StartStopWithChild_Hierarchical | .NET Framework 4.7.2 | 5.165 us |
0.1245 us | 0.7324 | 4.51 KB | -0.03 KB |
| | | | | | | |
| StartStopWithChild | .NET 6.0 | 3.312 us | 0.0266 us | 0.0153 | 3.78
KB | -0.04 KB |
| StartStopWithChild | .NET 8.0 | 2.648 us | 0.0306 us | 0.0114 | 3.78
KB | -0.13 KB |
| StartStopWithChild | .NET Core 3.1 | 4.344 us | 0.0555 us | 0.0076 |
3.97 KB | -0.23 KB |
| StartStopWithChild | .NET Framework 4.7.2 | 5.234 us | 0.1568 us |
0.7095 | 4.39 KB | -0.30 KB |


</p>
</details> 


| Method | Runtime | Mean | StdDev | Gen0 | Allocated | Alloc Ratio |
| ------------------------------- | -------------------- | -------: |
--------: | -----: | --------: | ----------: |
| StartStopWithChild_Hierarchical | .NET 6.0 | 3.770 us | 0.3101 us |
0.0076 | 3.58 KB | -0.23 KB |
| StartStopWithChild_Hierarchical | .NET 8.0 | 3.039 us | 0.2377 us |
0.0076 | 3.58 KB | -0.23 KB |
| StartStopWithChild_Hierarchical | .NET Core 3.1 | 4.490 us | 0.1135 us
| 0.0076 | 3.8 KB | -0.22 KB |
| StartStopWithChild_Hierarchical | .NET Framework 4.7.2 | 5.303 us |
0.2469 us | 0.6943 | 4.3 KB | -0.24 KB |
| | | | | | | |
| StartStopWithChild | .NET 6.0 | 3.386 us | 0.0971 us | 0.0114 | 3.59
KB | -0.43 KB |
| StartStopWithChild | .NET 8.0 | 2.661 us | 0.0218 us | 0.0114 | 3.59
KB | -0.32 KB |
| StartStopWithChild | .NET Core 3.1 | 4.540 us | 0.1625 us | 0.0076 |
3.78 KB | -0.42 KB |
| StartStopWithChild | .NET Framework 4.7.2 | 5.563 us | 0.1946 us |
0.6790 | 4.2 KB | -0.49 KB |


## Other details

https://datadoghq.atlassian.net/browse/LANGPLAT-915

Part of a stack working to improve OTel performance

- #8036
- #8037
- #8038
- #8039
- #8040 👈
- #8041
- #8042
andrewlock added a commit that referenced this pull request Jan 13, 2026
## Summary of changes

Update code that enumerates properties of `IActivity` duck types to use
allocation-free enumeration

## Reason for change

Our duck types all use the public APIs of `Activity` to grab tags,
events, and links. These types are generally implemented internally as
custom linked list objects, but are exposed as `IEnumerable<>` types.
The internal types implement a `struct`-based enumerator, but as the
types aren't exposed directly, the compiler can't use those and ends up
allocating. That's... annoying.

To work around that, this adds a "hack" that OTel were [actually using
themselves](https://github.com/open-telemetry/opentelemetry-dotnet/blob/73bff75ef653f81fe6877299435b21131be36dc0/src/OpenTelemetry/Internal/EnumerationHelper.cs#L58)
(and [which has been discussed
elsewhere](https://www.macrosssoftware.com/2020/07/13/enumerator-performance-surprises/)),
of building a dynamic method that finds the `GetEnumerator` type and
uses that. We can't use it everywhere because we don't have the types
available (e.g. links and events), but we _can_ use it for tags. There's
essentially a trade-off of startup time vs runtime allocation, which is
worth it in this case IMO.

Additionally, the `Activity` implementation uses an empty array as the
return value when there _are_ no tags/links/events, so we can avoid the
enumerator allocation in those cases by specifically looking for an
empty array and bailing if so.

## Implementation details

I chose to add the `EnumerationHelper` as an explicitly
`Activity`-related thing for now, but we can consider exposing this more
widely if we get confidence.

I also avoided the concurrent dictionary approach that is the "default"
for this, and instead added an additional helper to read a static field
that we populate once. We have to wait until we have a "real" activity
to do the population, so that we get the right type, which is a bit
annoying, but it works...

## Test coverage

Added a couple of unit tests for the result and confirmed it works as
expected. Benchmarks vs #8040 shw a nice allocation improvement:


| Method | Runtime | Mean | StdDev | Gen0 | Gen1 | Allocated | Compared
to #8040 |
| ------------------------------- | -------------------- | ---------: |
--------: | -----: | -----: | --------: | ----------------: |
| StartStopWithChild_Hierarchical | .NET 6.0 | 3,248.6 ns | 28.71 ns |
0.0114 | 0.0038 | 3.6 KB | -0.19 KB |
| StartStopWithChild_Hierarchical | .NET 8.0 | 2,651.1 ns | 80.80 ns |
0.0114 | - | 3.6 KB | -0.19 KB |
| StartStopWithChild_Hierarchical | .NET Core 3.1 | 4,482.2 ns | 263.51
ns | 0.0153 | - | 3.83 KB | -0.19 KB |
| StartStopWithChild_Hierarchical | .NET Framework 4.7.2 | 4,915.8 ns |
63.81 ns | 0.7019 | - | 4.33 KB | -0.18 KB |
| StartStopWithChild | .NET 6.0 | 3,176.8 ns | 37.73 ns | 0.0114 | - |
3.59 KB | -0.19 KB |
| StartStopWithChild | .NET 8.0 | 2,598.0 ns | 40.05 ns | 0.0114 | - |
3.59 KB | -0.19 KB |
| StartStopWithChild | .NET Core 3.1 | 4,260.7 ns | 59.98 ns | 0.0076 |
- | 3.78 KB | -0.19 KB |
| StartStopWithChild | .NET Framework 4.7.2 | 5,097.7 ns | 46.98 ns |
0.6790 | - | 4.2 KB | -0.19 KB |

## Other details

https://datadoghq.atlassian.net/browse/LANGPLAT-915

Part of a stack working to improve OTel performance

- #8036
- #8037
- #8038
- #8039
- #8040
- #8041 👈
- #8042

---------

Co-authored-by: Lucas Pimentel <lucas.pimentel@datadoghq.com>
andrewlock added a commit that referenced this pull request Jan 13, 2026
…8042)

## Summary of changes

Improve performance around the "population" of tags from an `Activity`
into a `Span`

## Reason for change

Currently we do a _lot_ of allocation in the
`OtlpHelpers.AgentConvertSpan` method. The `OpenTelemetryTags` object
also has a lot of properties on it that we never set, and are only used
for _reading_ tags in the `OperationMapper`, even though many may never
even be read. This increases the size of the tags object.

This PR aims to do various optimizations around the "close activity"
paths:
- Reduce the size of the `OpenTelemetryTags` object by removing
properties only used by `OperationMapper`
- Add properties to `OpenTelemetryTags` for values that we explictly set
in `OtlpHelpers.AgentConvertSpan`

## Implementation details

- Currently we're always creating the `OpenTelemetryTags`, even though
we _may_ throw it away if the activity is ignored, so that's an easy
win.
- I think we can assume that the tags object passed to
`AgentConvertSpan` is only ever `OpenTelemetryTags` based on the call
sites, so we can interact with the tags object directly where posisble.
- "Simplify" some of the code paths by avoiding `span.SetTag` and
favouring `tags.SetTag` when we know it's not a "special" tag.
- Inline a few method calls in places where they will often not be
called, at the expense of some clarity
- Refactor `OperationMapper` to use the `GetTag()` API, and cache the
values at various points. The resulting code is uglier, but overall
means we reduce ~100 bytes off every span, so I think it's worth it

## Test coverage

Functionality is covered by existing, this is just a refactoring, and
there's good tests for `OperationMapper` currently. Benchmarks vs
#8041 show a nice
allocation improvement, at the expense of slower execution (this was
local though, so not 100% on that, will see what the CI results show).
If there is a significant slow down, I'll try to isolate it, but I think
the allocation improvments are probably worth it either way


| Method | Runtime | Mean | StdDev | Gen0 | Allocated | Compared to
#8041 |
| ------------------------------- | -------------------- | -------: |
--------: | -----: | --------: | ----------------: |
| StartStopWithChild_Hierarchical | .NET 6.0 | 4.438 us | 0.7347 us |
0.0076 | 3.2 KB | -0.40 KB |
| StartStopWithChild_Hierarchical | .NET 8.0 | 3.216 us | 0.4260 us |
0.0076 | 3.2 KB | -0.40 KB |
| StartStopWithChild_Hierarchical | .NET Core 3.1 | 5.293 us | 0.8316 us
| 0.0076 | 3.42 KB | -0.41 KB |
| StartStopWithChild_Hierarchical | .NET Framework 4.7.2 | 5.735 us |
0.5592 us | 0.6332 | 3.92 KB | -0.41 KB |
| | | | | | | |
| StartStopWithChild | .NET 6.0 | 3.848 us | 0.5325 us | 0.0076 | 3.19
KB | -0.40 KB |
| StartStopWithChild | .NET 8.0 | 3.080 us | 0.2698 us | 0.0076 | 3.19
KB | -0.40 KB |
| StartStopWithChild | .NET Core 3.1 | 5.094 us | 0.4982 us | 0.0076 |
3.38 KB | -0.40 KB |
| StartStopWithChild | .NET Framework 4.7.2 | 6.172 us | 0.5948 us |
0.6104 | 3.79 KB | -0.41 KB |


## Other details

Added a couple of extra tests that were missing from the previous PR in
the stack too

https://datadoghq.atlassian.net/browse/LANGPLAT-915

Part of a stack working to improve OTel performance

- #8036
- #8037
- #8038
- #8039
- #8040
- #8041
- #8042 👈
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:benchmarks Benchmarks, throughput tests, Crank, Bombardier, etc area:opentelemetry OpenTelemetry support area:tests unit tests, integration tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants