New circular buffer using faa instead of cas to optimize performance

**Describe the solution you'd like**

I noticed that original circular buffer using by batch span processor use lock free MPSC queu based on CAS. When experiencing intense multithreading contention, Compare-And-Swap (CAS) exhibits poorer scalability compared to Fetch-And-Add (FAA). In reference of https://github.com/dbittman/waitfree-mpsc-queue, this pr attempt to implement a wait-free MPSC (Multiple Producer, Single Consumer) queue using FAA. Base on original benchmark, it indicate that this approach demonstrates better performance scalability.

```
Run on (48 X 2593.99 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x24)
  L1 Instruction 32 KiB (x24)
  L2 Unified 256 KiB (x24)
  L3 Unified 30720 KiB (x2)
Load Average: 7.85, 5.70, 4.48
----------------------------------------------------------------
Benchmark                      Time             CPU   Iterations
----------------------------------------------------------------
BM_BaselineBuffer/1     10178537 ns        51528 ns         1000
BM_BaselineBuffer/2      7408646 ns        69828 ns         1000
BM_BaselineBuffer/4      7684772 ns       127549 ns         1000
BM_BaselineBuffer/8      7222459 ns       278660 ns         1000
BM_BaselineBuffer/16     6716972 ns       603712 ns         1215
BM_LockFreeBuffer/1      3915343 ns        53125 ns         1000
BM_LockFreeBuffer/2      4798406 ns        70581 ns         1000
BM_LockFreeBuffer/4      4562709 ns       128493 ns         1000
BM_LockFreeBuffer/8      4935221 ns       289996 ns         1000
BM_LockFreeBuffer/16     5187913 ns       618856 ns         1081
BM_OptimizedBuffer/1     4256507 ns        49970 ns         1000
BM_OptimizedBuffer/2     3398719 ns        67712 ns         1000
BM_OptimizedBuffer/4     3204749 ns       127378 ns         1000
BM_OptimizedBuffer/8     3230722 ns       296507 ns         1000
BM_OptimizedBuffer/16    3859005 ns       769220 ns         1000
```

More detail see draft pr: https://github.com/open-telemetry/opentelemetry-cpp/pull/3644


**Describe alternatives you've considered**
Which alternative solutions or features have you considered?

**Additional context**
Add any other context about the feature request here.

<sub>**Tip**: [React](https://github.blog/news-insights/product-news/add-reactions-to-pull-requests-issues-and-comments/) with 👍 to help prioritize this issue. Please use comments to provide useful context, avoiding `+1` or `me too`, to help us triage it. Learn more [here](https://opentelemetry.io/community/end-user/issue-participation/).</sub>


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

New circular buffer using faa instead of cas to optimize performance #3645

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

New circular buffer using faa instead of cas to optimize performance #3645

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions