Skip to content

Fix flaky benchmark validation tests by skipping fork in validation mode#5353

Merged
bm1549 merged 2 commits intomasterfrom
brian.marks/fix-flakes
Feb 12, 2026
Merged

Fix flaky benchmark validation tests by skipping fork in validation mode#5353
bm1549 merged 2 commits intomasterfrom
brian.marks/fix-flakes

Conversation

@bm1549
Copy link
Contributor

@bm1549 bm1549 commented Feb 12, 2026

Summary

  • Skip forking in run_benchmark when VALIDATE_BENCHMARK_MODE is true in benchmarks/tracing_trace.rb and benchmarks/error_tracking_simple.rb
  • Fixes flaky timeout in "Tracing benchmarks tracing_trace runs without raising errors" on master

Root cause

The expect_in_fork(timeout_seconds: 20) test forks once, then loads the benchmark file. The old run_benchmark forks again for each individual benchmark (7 in tracing, 8 in error tracking). Each nested fork lazily initializes the entire Datadog component stack and tears it down on exit.

Two problems compound:

  1. Cumulative overhead — Each fork+init+shutdown cycle costs ~265ms on a fast machine. On CI VMs (3-10x slower), 7 cycles can reach 9-19s, right at the 20s timeout boundary.

  2. Unbounded thread.join — The remote config worker shutdown at lib/datadog/core/remote/worker.rb:58 calls thread.join with no timeout. If any of the 7 forked children hangs here, Process.wait2(pid) in run_benchmark blocks indefinitely, guaranteeing a timeout.

Fix

In validation mode, call the benchmark block directly instead of forking. Forking exists to prevent monkey-patching leakage between benchmarks, but validation mode only checks for crashes (0.001s runtime, no results saved) — isolation is unnecessary.

Normal (non-validation) benchmark mode is completely unaffected.

Test results

Validation mode (the flaky tests):

$ VALIDATE_BENCHMARK=true bundle exec rspec spec/datadog/tracing/validate_benchmarks_spec.rb
2 examples, 0 failures — Finished in 0.22 seconds

$ VALIDATE_BENCHMARK=true bundle exec rspec spec/datadog/error_tracking/validate_benchmarks_spec.rb
2 examples, 0 failures — Finished in 0.21 seconds

Normal benchmark mode (confirms forking still works):

$ bundle exec ruby benchmarks/tracing_trace.rb
# All 7 benchmarks ran in separate forked processes (distinct PIDs), completed successfully

$ bundle exec ruby benchmarks/error_tracking_simple.rb
# All 8 benchmarks ran in separate forked processes (distinct PIDs), completed successfully

Per-fork overhead measurement (old vs new):

OLD — 7 nested fork+init+shutdown cycles:
  Fork 1: 274ms, Fork 2: 261ms, Fork 3: 261ms, Fork 4: 264ms,
  Fork 5: 269ms, Fork 6: 269ms, Fork 7: 268ms
  Total on local Mac: 1865ms
  Projected at 10x CI slowdown: 18.7s (timeout is 20s)

NEW — 0 nested forks, inline execution:
  Total: 474ms
  Projected at 10x CI slowdown: 4.7s

Test plan

  • VALIDATE_BENCHMARK=true bundle exec rspec spec/datadog/tracing/validate_benchmarks_spec.rb passes
  • VALIDATE_BENCHMARK=true bundle exec rspec spec/datadog/error_tracking/validate_benchmarks_spec.rb passes
  • bundle exec ruby benchmarks/tracing_trace.rb still forks each benchmark
  • bundle exec ruby benchmarks/error_tracking_simple.rb still forks each benchmark

Change log entry

None.

🤖 Generated with Claude Code

In validation mode (VALIDATE_BENCHMARK=true), `run_benchmark` no longer
forks each individual benchmark. This eliminates the root cause of timeout
flakiness: 7-8 nested fork+init+shutdown cycles inside the already-forked
`expect_in_fork` test, where each cycle pays ~265ms of Datadog stack
init/teardown overhead and risks hanging on an unbounded `thread.join` in
the remote config worker shutdown path.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@bm1549 bm1549 requested a review from a team as a code owner February 12, 2026 15:43
@github-actions
Copy link

github-actions bot commented Feb 12, 2026

Thank you for updating Change log entry section 👏

Visited at: 2026-02-12 16:17:00 UTC

@github-actions github-actions bot added the dev/testing Involves testing processes (e.g. RSpec) label Feb 12, 2026
@bm1549 bm1549 marked this pull request as draft February 12, 2026 15:44
@bm1549 bm1549 added the AI Generated Largely based on code generated by an AI or LLM. This label is the same across all dd-trace-* repos label Feb 12, 2026
Copy link
Member

@ivoanjo ivoanjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 LGTM agreed that validation mode is about making sure the code hasn't bitrot and less about isolation so I agree with these changes.

(I wonder if these benchmarks are still providing value; that is, should we remove them instead? But not a blocker for merging this change in)

@bm1549 bm1549 marked this pull request as ready for review February 12, 2026 16:19
@pr-commenter
Copy link

pr-commenter bot commented Feb 12, 2026

Benchmarks

Benchmark execution time: 2026-02-12 16:50:03

Comparing candidate commit d0c2d33 in PR branch brian.marks/fix-flakes with baseline commit 1507cc7 in branch master.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 44 metrics, 2 unstable metrics.

@bm1549
Copy link
Contributor Author

bm1549 commented Feb 12, 2026

I spot checked a few previously merged PRs and the few tests that timed out were not considered as part of a decision to merge. Since these are almost certainly unrelated to this change, I'll merge this as well

@bm1549 bm1549 merged commit 7a650cb into master Feb 12, 2026
1179 of 1218 checks passed
@bm1549 bm1549 deleted the brian.marks/fix-flakes branch February 12, 2026 21:55
@github-actions github-actions bot added this to the 2.29.0 milestone Feb 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

AI Generated Largely based on code generated by an AI or LLM. This label is the same across all dd-trace-* repos dev/testing Involves testing processes (e.g. RSpec)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants