-
Notifications
You must be signed in to change notification settings - Fork 598
Make stress tests independent of batch size #2306
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make stress tests independent of batch size #2306
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #2306 +/- ##
=====================================
Coverage 79.6% 79.6%
=====================================
Files 123 123
Lines 21263 21263
=====================================
+ Hits 16938 16940 +2
+ Misses 4325 4323 -2 ☔ View full report in Codecov by Sentry. 🚨 Try these New Features:
|
| let current_time = Instant::now(); | ||
| let elapsed = current_time.duration_since(last_collect_time).as_secs(); | ||
| if elapsed >= SLIDING_WINDOW_SIZE { | ||
| let total_count_u64 = shared_mutable_stats_slice.sum(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this read operation conflict with the concurrent writes, as there is no safety with UnsafeCell operation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
at worse, we'll underreport the numbers. ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At worst, we won't get the most up-to-date sum which is okay?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes under-reporting should be fine at worse. Though this can result in data corruption due to reading of partially updated values in some 32-bit machines (64-bit writes may not atomic in all 32-bit machines), but we needn't worry about that for stress test.
| unsafe { | ||
| shared_mutable_stats_slice.increment(thread_index); | ||
| } | ||
| if STOP.load(Ordering::SeqCst) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't we be now forced to check this AtomicBool for each iteration? I think that still be avoided and instead can be checked every BATCH_SIZE?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the STOP is only ever changed when exiting the stress test, the cache line where STOP resides wouldn't have to be updated during the test. So, I don't expect it to have much of a perf implication. (don't think it suffers from false sharing here as STOP is static).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we still want to avoid it, I'd prefer if remove the STOP variable altogether and simply exit the process on Ctrl + C.
I ran the stress test with the above-mentioned change, and there isn't much of a difference in the results. I would prefer removing it as the existing code for using STOP doesn't add much value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not a blocker for this PR, lets revisit removing it alltogether in a future pr, if needed.
|
@utpilla open-telemetry/opentelemetry-dotnet#5985 Can run an empty fun here too and share the resutls from same hardware? Curious to know the stress test in .NET,Rust is comparable when doing empty fn... |
Yes, it's comparable. In fact, the reason I sent the PR on the .NET repo is that our stress test numbers for an empty function were way better than .NET's. That's when I realized that their stress test suffers from false sharing 😄 For Rust, I stress tested the empty function on the same hardware (WSL though), and the throughput is around 7.7 B iterations per second. For .NET it was ~7B loops per second. |
|
perf numbers from my laptop ( 11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz 4 cores with 48 GB RAM) tracing with no subscriber 4B iterations/sec (.NET is ~35 M/sec) ^ Something I found in my personal note, a long time ago. 4B /sec was with |
Changes
Current implementation:
1000times (BATCH_SIZEis set to1000) and then updates its stats by updating theAtomicU64value inWorkerStatsvec.fetch_addatomic operations so that it does not interfere with the measurements reported for the function under test.Motivation for this PR
BATCH_SIZE. You have to choose an optimalBATCH_SIZEfor the function being stress tested. If theBATCH_SIZEis too low, then you make a lot morefetch_addcalls forWorkerStatswhich would influence the final throughput. If theBATCH_SIZEis too high, then you don't report the actual progress made in a timely manner.fetch_addcalls altogether.unsafecode to have the update threads report progress by making a simple+operation instead of the atomicfetch_addoperation.Note for reviewers:
Please hide whitespace from the diff for easier reviewing:
Merge requirement checklist
CHANGELOG.mdfiles updated for non-trivial, user-facing changes