-
Notifications
You must be signed in to change notification settings - Fork 500
Description
Describe the solution you'd like
I noticed that original circular buffer using by batch span processor use lock free MPSC queu based on CAS. When experiencing intense multithreading contention, Compare-And-Swap (CAS) exhibits poorer scalability compared to Fetch-And-Add (FAA). In reference of https://github.com/dbittman/waitfree-mpsc-queue, this pr attempt to implement a wait-free MPSC (Multiple Producer, Single Consumer) queue using FAA. Base on original benchmark, it indicate that this approach demonstrates better performance scalability.
Run on (48 X 2593.99 MHz CPU s)
CPU Caches:
L1 Data 32 KiB (x24)
L1 Instruction 32 KiB (x24)
L2 Unified 256 KiB (x24)
L3 Unified 30720 KiB (x2)
Load Average: 7.85, 5.70, 4.48
----------------------------------------------------------------
Benchmark Time CPU Iterations
----------------------------------------------------------------
BM_BaselineBuffer/1 10178537 ns 51528 ns 1000
BM_BaselineBuffer/2 7408646 ns 69828 ns 1000
BM_BaselineBuffer/4 7684772 ns 127549 ns 1000
BM_BaselineBuffer/8 7222459 ns 278660 ns 1000
BM_BaselineBuffer/16 6716972 ns 603712 ns 1215
BM_LockFreeBuffer/1 3915343 ns 53125 ns 1000
BM_LockFreeBuffer/2 4798406 ns 70581 ns 1000
BM_LockFreeBuffer/4 4562709 ns 128493 ns 1000
BM_LockFreeBuffer/8 4935221 ns 289996 ns 1000
BM_LockFreeBuffer/16 5187913 ns 618856 ns 1081
BM_OptimizedBuffer/1 4256507 ns 49970 ns 1000
BM_OptimizedBuffer/2 3398719 ns 67712 ns 1000
BM_OptimizedBuffer/4 3204749 ns 127378 ns 1000
BM_OptimizedBuffer/8 3230722 ns 296507 ns 1000
BM_OptimizedBuffer/16 3859005 ns 769220 ns 1000
More detail see draft pr: #3644
Describe alternatives you've considered
Which alternative solutions or features have you considered?
Additional context
Add any other context about the feature request here.
Tip: React with π to help prioritize this issue. Please use comments to provide useful context, avoiding +1
or me too
, to help us triage it. Learn more here.