Parallelize microbenchmarks and run them more times#5313
Parallelize microbenchmarks and run them more times#5313
Conversation
|
Thank you for updating Change log entry section 👏 Visited at: 2026-02-20 09:48:24 UTC |
|
✅ Tests 🎉 All green!❄️ No new flaky tests detected 🎯 Code Coverage (details) 🔗 Commit SHA: 091684c | Docs | Datadog PR Page | Was this helpful? Give us feedback! |
BenchmarksBenchmark execution time: 2026-02-23 08:27:15 Comparing candidate commit 091684c in PR branch Found 0 performance improvements and 1 performance regressions! Performance is the same for 45 metrics, 0 unstable metrics.
|
.gitlab/benchmarks.yml
Outdated
| matrix: | ||
| - BENCHMARKS: "profiling_allocation.rb profiling_gc.rb profiling_hold_resume_interruptions.rb profiling_http_transport.rb profiling_memory_sample_serialize.rb profiling_sample_loop_v2.rb profiling_sample_serialize.rb profiling_sample_gvl.rb profiling_string_storage_intern.rb" | ||
| - BENCHMARKS: "error_tracking_simple.rb tracing_trace.rb di_instrument.rb library_gem_loading.rb" |
There was a problem hiding this comment.
Could we keep this in the benchmarks folder?
In particular, it seems very easy to miss that any new benchmarks must always be added here, whereas before it was clear "there's a list of benchmarks in the script, and the script is in the same folder as the benchmarks, and the README documents this".
There was a problem hiding this comment.
Hey Ivo! Thanks for the review.
Some alternatives:
- Defining those lists of benchmarks on something like
benchmarks/execution.ymland importing them onbenchmarks.yml. - Modifying the README to say that you must add a new benchmark to the
.gitlab-ci.yml, instead of saying you must add a new benchmark torun_all.sh
I prefer alternative 2, since adding new benchmarks with the introduced fixes requires knowing that you should split benchmarks across different jobs to make sure there are enough CPUs for all of them.
I added some changes for alternative 2. I'm happy to revert them if you think other alternatives would be better.
Alternative 1's benchmarks/execution.yml could be something like this:
.groups:
- &profiling >-
profiling_allocation.rb
profiling_gc.rb
profiling_hold_resume_interruptions.rb
profiling_http_transport.rb
profiling_memory_sample_serialize.rb
profiling_sample_loop_v2.rb
profiling_sample_serialize.rb
profiling_sample_gvl.rb
profiling_string_storage_intern.rb
- &other >-
error_tracking_simple.rb
tracing_trace.rb
di_instrument.rb
library_gem_loading.rb
.execution:
variables:
REPETITIONS: "10"
CPU_AFFINITY: "24-47"
CPUS_PER_BENCHMARK: "2"
parallel:
matrix:
- BENCHMARKS:
- *profiling
- *otherAnd then on benchmarks.yml:
include:
- local: 'benchmarks/execution.yml'
microbenchmarks:
extends: .execution
script:
- ./benchmarks/run_all.sh
There was a problem hiding this comment.
I like an alternative suggestion, but then we need to signal in some docs about how we do it 👍🏼
There was a problem hiding this comment.
Just to be sure - alternative 1, right? @Strech
There was a problem hiding this comment.
I kinda like benchmarks/execution.yml, but yeah I think if it's clear in the README what to update option 2 is probably fine as well
There was a problem hiding this comment.
I kept the benchmarks/execution.yml. Turned out to be neater. 406f32f
|
I think the PR seems in good shape? My only question is -- I see the new jobs coming up in GitLab -- is there a way to check that the way we're exposing the files is still correct? E.g. how do we test that we haven't broken our benchmark reporting code? |
benchmarks/run_all.sh
Outdated
| for file in "${benchmark_array[@]}"; do | ||
| local cpus | ||
| cpus=$(get_cpus_for_benchmark "$cpu_ids_str" "$idx" "$cpus_per_benchmark") | ||
| taskset -c "$cpus" bundle exec ruby "$SCRIPT_DIR/$file" & |
There was a problem hiding this comment.
In my testing taskset does not seem to actually work, was this manually tested to function as expected?
There was a problem hiding this comment.
And if any of these executions fail, how does the process exit status get tracked and reported? Doesn't look like this is done at all?
There was a problem hiding this comment.
In my testing taskset does not seem to actually work, was this manually tested to function as expected?
Yes. I tested it with some dummy CI jobs.
And if any of these executions fail, how does the process exit status get tracked and reported? Doesn't look like this is done at all?
You're right, this is not done at all! If a benchmark fails, we won't know. I'm working on this tracking.
There was a problem hiding this comment.
This was fixed. The test checkboxes on the PR description covers this and link to results.
|
Hey @ivoanjo and @p-datadog, thank you for the reviews! I'll answer Oleg on the conversation thread. To answer Ivo:
Great point, and while reports on the BP UI are working as expected, PR comments from one microbenchmarking job will overwrite the other. Results have to be combined somehow. |
Like on other benchmarks
4fe6f54 to
c7a672e
Compare
|
Hi! For visibility, I re-requested reviews from all that have reviewed, since you have pointed out towards different fixes. |
ivoanjo
left a comment
There was a problem hiding this comment.
👍 LGTM
We're in the middle of a release so merging to master is blocked, it should be unblocked later today/by Monday.
At this point I don't see any reason to not to give this a try and then if there's some extra adjustment needed we'll do at as a follow-up PR.
| # Reinstall a recent version of the trace to help Docker cache dependencies. | ||
| # Bump this version periodically. | ||
| RUN gem install datadog -v 1.20.0 | ||
| RUN gem install datadog -v 2.28.0 |
There was a problem hiding this comment.
Lol this was broken before and nobody noticed! There's no datadog 1.20.0, we renamed the library. Thanks for catching it.
|
#5313 (comment) is awesome. We're comparing benchmarks from this branch against master. Since there were no code changes that should impact performance, we would indeed expect to see 0 improvements/regression. Corroborates with Ruby microbenchmark stability experiments showing that 10 reps took out all flakiness. 6 reps seems to be sufficient. |
What does this PR do?
REPETITIONSonbenchmarks/execution.yml) to reduce inter-run variability.benchmarks/execution.ymlCPUS_PER_BENCHMARKon thebenchmarks/execution.ymljob).Motivation:
https://datadoghq.atlassian.net/browse/APMSP-2544
Change log entry
None.
Additional Notes:
How to test the change?
Execution and reporting
Reducing flakiness
The effect of multiple repetitions and CPU isolation on result variability was tested and reported in this document: https://datadoghq.atlassian.net/wiki/x/egJ3cAE
25 out of ~45 scenarios were flaky before fixes, 0 are flaky after fixes.
These tests used 10 repetitions. While this PR introduces 6 repetitions, it should already bring the flakiness down.