Skip to content

Commit f613855

Browse files
committed
Remove duplicated section in benchmark README
1 parent a7379c9 commit f613855

File tree

1 file changed

+2
-9
lines changed

1 file changed

+2
-9
lines changed

benches/README.md

Lines changed: 2 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -111,16 +111,9 @@ comparison fastest │ slowest │ median │ me
111111

112112
## Analysis
113113

114-
Results are **platform-dependent**:
114+
The additional atomic RMW in `go_observe` (the `_count` increment) has a measurable cost across all platforms, with the sole exception of Apple M3 in uncontended scenario.
115115

116-
- The **additional atomic RMW** in `go_observe` has a **significant cost** on Ubuntu runners (x86-64 and aarch64), but is **negligible on Apple M3**.
117-
- **Cache locality** provides **consistent gains across all platforms**, reducing the impact of cache line invalidation from the contending thread.
116+
Cache locality, enabled by grouping all shard counters in a single cache line, delivers consistent performance improvements across all platforms, significantly reducing the impact of cache line invalidation triggered by the contending thread.
118117

119118
[^1]: On a MacBook Air M3, one `std::hint::spin_loop` call takes ~8 ns.
120119
[^2]: GitHub Actions workflow run: https://github.com/wyfo/split-histogram/actions/runs/18954432694
121-
122-
## Analysis
123-
124-
The **additional atomic RMW** in `go_observe` (the `_count` increment) has a **measurable cost** across **all platforms**, with the sole exception of Apple M3 in uncontended scenario.
125-
126-
**Cache locality**, enabled by grouping all shard counters in a single cache line, delivers **consistent performance improvements across all platforms**, significantly reducing the impact of cache line invalidation triggered by the contending thread.

0 commit comments

Comments
 (0)