Skip to content

Conversation

@utpilla
Copy link
Contributor

@utpilla utpilla commented Nov 18, 2024

Changes

Current implementation:

  • The current implementation of stress test reports the progress made in batches instead of reporting it on the fly.
  • Each thread executes the function to be stress tested 1000 times (BATCH_SIZE is set to 1000) and then updates its stats by updating the AtomicU64 value in WorkerStats vec.
  • The motivation to report the progress in batches is mainly to lower the number of fetch_add atomic operations so that it does not interfere with the measurements reported for the function under test.

Motivation for this PR

  • Reporting measurements in batches makes the stress test results dependent on the BATCH_SIZE. You have to choose an optimal BATCH_SIZE for the function being stress tested. If the BATCH_SIZE is too low, then you make a lot more fetch_add calls for WorkerStats which would influence the final throughput. If the BATCH_SIZE is too high, then you don't report the actual progress made in a timely manner.
  • If the whole idea, is to not have the stress test's own bookkeeping affect the throughput results, then we can consider avoiding making fetch_add calls altogether.
  • This PR makes use of some unsafe code to have the update threads report progress by making a simple + operation instead of the atomic fetch_add operation.
  • Another change in the PR is to use scoped threads which simplifies things a bit by not having to join on the update thread handles.

Note for reviewers:

Please hide whitespace from the diff for easier reviewing:

image

Merge requirement checklist

  • CONTRIBUTING guidelines followed
  • Unit tests added/updated (if applicable)
  • Appropriate CHANGELOG.md files updated for non-trivial, user-facing changes
  • Changes in public API reviewed (if applicable)

@utpilla utpilla requested a review from a team as a code owner November 18, 2024 02:32
@codecov
Copy link

codecov bot commented Nov 18, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 79.6%. Comparing base (3ac2d9f) to head (f442e41).
Report is 1 commits behind head on main.

Additional details and impacted files
@@          Coverage Diff          @@
##            main   #2306   +/-   ##
=====================================
  Coverage   79.6%   79.6%           
=====================================
  Files        123     123           
  Lines      21263   21263           
=====================================
+ Hits       16938   16940    +2     
+ Misses      4325    4323    -2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.


🚨 Try these New Features:

let current_time = Instant::now();
let elapsed = current_time.duration_since(last_collect_time).as_secs();
if elapsed >= SLIDING_WINDOW_SIZE {
let total_count_u64 = shared_mutable_stats_slice.sum();
Copy link
Member

@lalitb lalitb Nov 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this read operation conflict with the concurrent writes, as there is no safety with UnsafeCell operation?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at worse, we'll underreport the numbers. ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At worst, we won't get the most up-to-date sum which is okay?

Copy link
Member

@lalitb lalitb Nov 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes under-reporting should be fine at worse. Though this can result in data corruption due to reading of partially updated values in some 32-bit machines (64-bit writes may not atomic in all 32-bit machines), but we needn't worry about that for stress test.

unsafe {
shared_mutable_stats_slice.increment(thread_index);
}
if STOP.load(Ordering::SeqCst) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't we be now forced to check this AtomicBool for each iteration? I think that still be avoided and instead can be checked every BATCH_SIZE?

Copy link
Contributor Author

@utpilla utpilla Nov 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the STOP is only ever changed when exiting the stress test, the cache line where STOP resides wouldn't have to be updated during the test. So, I don't expect it to have much of a perf implication. (don't think it suffers from false sharing here as STOP is static).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we still want to avoid it, I'd prefer if remove the STOP variable altogether and simply exit the process on Ctrl + C.

I ran the stress test with the above-mentioned change, and there isn't much of a difference in the results. I would prefer removing it as the existing code for using STOP doesn't add much value.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not a blocker for this PR, lets revisit removing it alltogether in a future pr, if needed.

@cijothomas
Copy link
Member

@utpilla open-telemetry/opentelemetry-dotnet#5985 Can run an empty fun here too and share the resutls from same hardware? Curious to know the stress test in .NET,Rust is comparable when doing empty fn...

@utpilla
Copy link
Contributor Author

utpilla commented Nov 18, 2024

@utpilla open-telemetry/opentelemetry-dotnet#5985 Can run an empty fun here too and share the resutls from same hardware? Curious to know the stress test in .NET,Rust is comparable when doing empty fn...

Yes, it's comparable. In fact, the reason I sent the PR on the .NET repo is that our stress test numbers for an empty function were way better than .NET's. That's when I realized that their stress test suffers from false sharing 😄

For Rust, I stress tested the empty function on the same hardware (WSL though), and the throughput is around 7.7 B iterations per second. For .NET it was ~7B loops per second.

@cijothomas
Copy link
Member

perf numbers from my laptop ( 11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz 4 cores with 48 GB RAM)

tracing with no subscriber 4B iterations/sec (.NET is ~35 M/sec)

^ Something I found in my personal note, a long time ago. 4B /sec was with tracing + no subscriber, which is pretty close to empty fn()!

@cijothomas cijothomas merged commit 41afd7f into open-telemetry:main Nov 18, 2024
24 of 25 checks passed
bantonsson pushed a commit to bantonsson/opentelemetry-rust that referenced this pull request Oct 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants