Make stress tests independent of batch size #2306

utpilla · 2024-11-18T02:32:25Z

Changes

Current implementation:

The current implementation of stress test reports the progress made in batches instead of reporting it on the fly.
Each thread executes the function to be stress tested 1000 times (BATCH_SIZE is set to 1000) and then updates its stats by updating the AtomicU64 value in WorkerStats vec.
The motivation to report the progress in batches is mainly to lower the number of fetch_add atomic operations so that it does not interfere with the measurements reported for the function under test.

Motivation for this PR

Reporting measurements in batches makes the stress test results dependent on the BATCH_SIZE. You have to choose an optimal BATCH_SIZE for the function being stress tested. If the BATCH_SIZE is too low, then you make a lot more fetch_add calls for WorkerStats which would influence the final throughput. If the BATCH_SIZE is too high, then you don't report the actual progress made in a timely manner.
If the whole idea, is to not have the stress test's own bookkeeping affect the throughput results, then we can consider avoiding making fetch_add calls altogether.
This PR makes use of some unsafe code to have the update threads report progress by making a simple + operation instead of the atomic fetch_add operation.
Another change in the PR is to use scoped threads which simplifies things a bit by not having to join on the update thread handles.

Note for reviewers:

Please hide whitespace from the diff for easier reviewing:

Merge requirement checklist

CONTRIBUTING guidelines followed
Unit tests added/updated (if applicable)
Appropriate CHANGELOG.md files updated for non-trivial, user-facing changes
Changes in public API reviewed (if applicable)

codecov · 2024-11-18T02:35:50Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 79.6%. Comparing base (3ac2d9f) to head (f442e41).
Report is 1 commits behind head on main.

Additional details and impacted files

@@          Coverage Diff          @@
##            main   #2306   +/-   ##
=====================================
  Coverage   79.6%   79.6%           
=====================================
  Files        123     123           
  Lines      21263   21263           
=====================================
+ Hits       16938   16940    +2     
+ Misses      4325    4323    -2

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚨 Try these New Features:

Flaky Tests Detection - Detect and resolve failed and flaky tests

lalitb · 2024-11-18T17:29:06Z

stress/src/throughput.rs

+                let current_time = Instant::now();
+                let elapsed = current_time.duration_since(last_collect_time).as_secs();
+                if elapsed >= SLIDING_WINDOW_SIZE {
+                    let total_count_u64 = shared_mutable_stats_slice.sum();


Can this read operation conflict with the concurrent writes, as there is no safety with UnsafeCell operation?

at worse, we'll underreport the numbers. ?

At worst, we won't get the most up-to-date sum which is okay?

Yes under-reporting should be fine at worse. Though this can result in data corruption due to reading of partially updated values in some 32-bit machines (64-bit writes may not atomic in all 32-bit machines), but we needn't worry about that for stress test.

cijothomas · 2024-11-18T17:39:25Z

stress/src/throughput.rs

+                unsafe {
+                    shared_mutable_stats_slice.increment(thread_index);
+                }
+                if STOP.load(Ordering::SeqCst) {


Won't we be now forced to check this AtomicBool for each iteration? I think that still be avoided and instead can be checked every BATCH_SIZE?

Since the STOP is only ever changed when exiting the stress test, the cache line where STOP resides wouldn't have to be updated during the test. So, I don't expect it to have much of a perf implication. (don't think it suffers from false sharing here as STOP is static).

If we still want to avoid it, I'd prefer if remove the STOP variable altogether and simply exit the process on Ctrl + C.

I ran the stress test with the above-mentioned change, and there isn't much of a difference in the results. I would prefer removing it as the existing code for using STOP doesn't add much value.

not a blocker for this PR, lets revisit removing it alltogether in a future pr, if needed.

cijothomas · 2024-11-18T18:40:06Z

@utpilla open-telemetry/opentelemetry-dotnet#5985 Can run an empty fun here too and share the resutls from same hardware? Curious to know the stress test in .NET,Rust is comparable when doing empty fn...

utpilla · 2024-11-18T18:45:56Z

@utpilla open-telemetry/opentelemetry-dotnet#5985 Can run an empty fun here too and share the resutls from same hardware? Curious to know the stress test in .NET,Rust is comparable when doing empty fn...

Yes, it's comparable. In fact, the reason I sent the PR on the .NET repo is that our stress test numbers for an empty function were way better than .NET's. That's when I realized that their stress test suffers from false sharing 😄

For Rust, I stress tested the empty function on the same hardware (WSL though), and the throughput is around 7.7 B iterations per second. For .NET it was ~7B loops per second.

cijothomas · 2024-11-18T18:53:45Z

perf numbers from my laptop ( 11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz 4 cores with 48 GB RAM)

tracing with no subscriber 4B iterations/sec (.NET is ~35 M/sec)

^ Something I found in my personal note, a long time ago. 4B /sec was with tracing + no subscriber, which is pretty close to empty fn()!

utpilla added 2 commits November 15, 2024 02:56

Update stress test

deb07ed

Code changes

4c357f0

utpilla requested a review from a team as a code owner November 18, 2024 02:32

Fix CI

f442e41

lalitb reviewed Nov 18, 2024

View reviewed changes

cijothomas reviewed Nov 18, 2024

View reviewed changes

cijothomas approved these changes Nov 18, 2024

View reviewed changes

lalitb approved these changes Nov 18, 2024

View reviewed changes

cijothomas merged commit 41afd7f into open-telemetry:main Nov 18, 2024
24 of 25 checks passed

bantonsson pushed a commit to bantonsson/opentelemetry-rust that referenced this pull request Oct 9, 2025

Make stress tests independent of batch size (open-telemetry#2306)

b362e8c

Make stress tests independent of batch size #2306

Make stress tests independent of batch size #2306

Uh oh!

Conversation

utpilla commented Nov 18, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Current implementation:

Motivation for this PR

Note for reviewers:

Merge requirement checklist

Uh oh!

codecov bot commented Nov 18, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

lalitb Nov 18, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cijothomas Nov 18, 2024

Choose a reason for hiding this comment

Uh oh!

utpilla Nov 18, 2024

Choose a reason for hiding this comment

Uh oh!

lalitb Nov 18, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cijothomas Nov 18, 2024

Choose a reason for hiding this comment

Uh oh!

utpilla Nov 18, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

utpilla Nov 18, 2024

Choose a reason for hiding this comment

Uh oh!

cijothomas Nov 18, 2024

Choose a reason for hiding this comment

Uh oh!

cijothomas commented Nov 18, 2024

Uh oh!

utpilla commented Nov 18, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cijothomas commented Nov 18, 2024

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

utpilla commented Nov 18, 2024 •

edited

Loading

codecov bot commented Nov 18, 2024 •

edited

Loading

lalitb Nov 18, 2024 •

edited

Loading

lalitb Nov 18, 2024 •

edited

Loading

utpilla Nov 18, 2024 •

edited

Loading

utpilla commented Nov 18, 2024 •

edited

Loading