Fix flaky benchmark validation tests by skipping fork in validation mode by bm1549 · Pull Request #5353 · DataDog/dd-trace-rb

bm1549 · 2026-02-12T15:43:55Z

Summary

Skip forking in run_benchmark when VALIDATE_BENCHMARK_MODE is true in benchmarks/tracing_trace.rb and benchmarks/error_tracking_simple.rb
Fixes flaky timeout in "Tracing benchmarks tracing_trace runs without raising errors" on master

Root cause

The expect_in_fork(timeout_seconds: 20) test forks once, then loads the benchmark file. The old run_benchmark forks again for each individual benchmark (7 in tracing, 8 in error tracking). Each nested fork lazily initializes the entire Datadog component stack and tears it down on exit.

Two problems compound:

Cumulative overhead — Each fork+init+shutdown cycle costs ~265ms on a fast machine. On CI VMs (3-10x slower), 7 cycles can reach 9-19s, right at the 20s timeout boundary.
Unbounded thread.join — The remote config worker shutdown at lib/datadog/core/remote/worker.rb:58 calls thread.join with no timeout. If any of the 7 forked children hangs here, Process.wait2(pid) in run_benchmark blocks indefinitely, guaranteeing a timeout.

Fix

In validation mode, call the benchmark block directly instead of forking. Forking exists to prevent monkey-patching leakage between benchmarks, but validation mode only checks for crashes (0.001s runtime, no results saved) — isolation is unnecessary.

Normal (non-validation) benchmark mode is completely unaffected.

Test results

Validation mode (the flaky tests):

$ VALIDATE_BENCHMARK=true bundle exec rspec spec/datadog/tracing/validate_benchmarks_spec.rb
2 examples, 0 failures — Finished in 0.22 seconds

$ VALIDATE_BENCHMARK=true bundle exec rspec spec/datadog/error_tracking/validate_benchmarks_spec.rb
2 examples, 0 failures — Finished in 0.21 seconds

Normal benchmark mode (confirms forking still works):

$ bundle exec ruby benchmarks/tracing_trace.rb
# All 7 benchmarks ran in separate forked processes (distinct PIDs), completed successfully

$ bundle exec ruby benchmarks/error_tracking_simple.rb
# All 8 benchmarks ran in separate forked processes (distinct PIDs), completed successfully

Per-fork overhead measurement (old vs new):

OLD — 7 nested fork+init+shutdown cycles:
  Fork 1: 274ms, Fork 2: 261ms, Fork 3: 261ms, Fork 4: 264ms,
  Fork 5: 269ms, Fork 6: 269ms, Fork 7: 268ms
  Total on local Mac: 1865ms
  Projected at 10x CI slowdown: 18.7s (timeout is 20s)

NEW — 0 nested forks, inline execution:
  Total: 474ms
  Projected at 10x CI slowdown: 4.7s

Test plan

VALIDATE_BENCHMARK=true bundle exec rspec spec/datadog/tracing/validate_benchmarks_spec.rb passes
VALIDATE_BENCHMARK=true bundle exec rspec spec/datadog/error_tracking/validate_benchmarks_spec.rb passes
bundle exec ruby benchmarks/tracing_trace.rb still forks each benchmark
bundle exec ruby benchmarks/error_tracking_simple.rb still forks each benchmark

Change log entry

None.

🤖 Generated with Claude Code

In validation mode (VALIDATE_BENCHMARK=true), `run_benchmark` no longer forks each individual benchmark. This eliminates the root cause of timeout flakiness: 7-8 nested fork+init+shutdown cycles inside the already-forked `expect_in_fork` test, where each cycle pays ~265ms of Datadog stack init/teardown overhead and risks hanging on an unbounded `thread.join` in the remote config worker shutdown path. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

github-actions · 2026-02-12T15:44:10Z

Thank you for updating Change log entry section 👏

^{Visited at: 2026-02-12 16:17:00 UTC}

ivoanjo

👍 LGTM agreed that validation mode is about making sure the code hasn't bitrot and less about isolation so I agree with these changes.

(I wonder if these benchmarks are still providing value; that is, should we remove them instead? But not a blocker for merging this change in)

pr-commenter · 2026-02-12T16:50:27Z

Benchmarks

Benchmark execution time: 2026-02-12 16:50:03

Comparing candidate commit d0c2d33 in PR branch brian.marks/fix-flakes with baseline commit 1507cc7 in branch master.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 44 metrics, 2 unstable metrics.

bm1549 · 2026-02-12T21:55:43Z

I spot checked a few previously merged PRs and the few tests that timed out were not considered as part of a decision to merge. Since these are almost certainly unrelated to this change, I'll merge this as well

bm1549 requested a review from a team as a code owner February 12, 2026 15:43

github-actions bot added the dev/testing Involves testing processes (e.g. RSpec) label Feb 12, 2026

bm1549 marked this pull request as draft February 12, 2026 15:44

bm1549 added the AI Generated Largely based on code generated by an AI or LLM. This label is the same across all dd-trace-* repos label Feb 12, 2026

ivoanjo approved these changes Feb 12, 2026

View reviewed changes

bm1549 marked this pull request as ready for review February 12, 2026 16:19

Merge branch 'master' into brian.marks/fix-flakes

d0c2d33

bm1549 merged commit 7a650cb into master Feb 12, 2026
1179 of 1218 checks passed

bm1549 deleted the brian.marks/fix-flakes branch February 12, 2026 21:55

github-actions bot added this to the 2.29.0 milestone Feb 12, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix flaky benchmark validation tests by skipping fork in validation mode#5353

Fix flaky benchmark validation tests by skipping fork in validation mode#5353
bm1549 merged 2 commits intomasterfrom
brian.marks/fix-flakes

bm1549 commented Feb 12, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Feb 12, 2026 •

edited

Loading

Uh oh!

ivoanjo left a comment

Uh oh!

pr-commenter bot commented Feb 12, 2026

Uh oh!

bm1549 commented Feb 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

bm1549 commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Root cause

Fix

Test results

Test plan

Uh oh!

github-actions bot commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ivoanjo left a comment

Choose a reason for hiding this comment

Uh oh!

pr-commenter bot commented Feb 12, 2026

Benchmarks

Uh oh!

bm1549 commented Feb 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

bm1549 commented Feb 12, 2026 •

edited

Loading

github-actions bot commented Feb 12, 2026 •

edited

Loading