Fix flaky benchmark validation tests by skipping fork in validation mode#5353
Merged
Fix flaky benchmark validation tests by skipping fork in validation mode#5353
Conversation
In validation mode (VALIDATE_BENCHMARK=true), `run_benchmark` no longer forks each individual benchmark. This eliminates the root cause of timeout flakiness: 7-8 nested fork+init+shutdown cycles inside the already-forked `expect_in_fork` test, where each cycle pays ~265ms of Datadog stack init/teardown overhead and risks hanging on an unbounded `thread.join` in the remote config worker shutdown path. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Thank you for updating Change log entry section 👏 Visited at: 2026-02-12 16:17:00 UTC |
ivoanjo
approved these changes
Feb 12, 2026
Member
ivoanjo
left a comment
There was a problem hiding this comment.
👍 LGTM agreed that validation mode is about making sure the code hasn't bitrot and less about isolation so I agree with these changes.
(I wonder if these benchmarks are still providing value; that is, should we remove them instead? But not a blocker for merging this change in)
Contributor
Author
|
I spot checked a few previously merged PRs and the few tests that timed out were not considered as part of a decision to merge. Since these are almost certainly unrelated to this change, I'll merge this as well |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
run_benchmarkwhenVALIDATE_BENCHMARK_MODEis true inbenchmarks/tracing_trace.rbandbenchmarks/error_tracking_simple.rbRoot cause
The
expect_in_fork(timeout_seconds: 20)test forks once, then loads the benchmark file. The oldrun_benchmarkforks again for each individual benchmark (7 in tracing, 8 in error tracking). Each nested fork lazily initializes the entire Datadog component stack and tears it down on exit.Two problems compound:
Cumulative overhead — Each fork+init+shutdown cycle costs ~265ms on a fast machine. On CI VMs (3-10x slower), 7 cycles can reach 9-19s, right at the 20s timeout boundary.
Unbounded
thread.join— The remote config worker shutdown atlib/datadog/core/remote/worker.rb:58callsthread.joinwith no timeout. If any of the 7 forked children hangs here,Process.wait2(pid)inrun_benchmarkblocks indefinitely, guaranteeing a timeout.Fix
In validation mode, call the benchmark block directly instead of forking. Forking exists to prevent monkey-patching leakage between benchmarks, but validation mode only checks for crashes (0.001s runtime, no results saved) — isolation is unnecessary.
Normal (non-validation) benchmark mode is completely unaffected.
Test results
Validation mode (the flaky tests):
Normal benchmark mode (confirms forking still works):
Per-fork overhead measurement (old vs new):
Test plan
VALIDATE_BENCHMARK=true bundle exec rspec spec/datadog/tracing/validate_benchmarks_spec.rbpassesVALIDATE_BENCHMARK=true bundle exec rspec spec/datadog/error_tracking/validate_benchmarks_spec.rbpassesbundle exec ruby benchmarks/tracing_trace.rbstill forks each benchmarkbundle exec ruby benchmarks/error_tracking_simple.rbstill forks each benchmarkChange log entry
None.
🤖 Generated with Claude Code