-
Notifications
You must be signed in to change notification settings - Fork 193
Add lock-free multicast ring buffer for ultra-low latency messaging #61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Co-authored-by: silahian <[email protected]>
Co-authored-by: silahian <[email protected]>
Co-authored-by: silahian <[email protected]>
|
@copilot I see that you created VisualHFT.Benchmarks and VisualHFT.Commons.Test, which is great. |
…ention Co-authored-by: silahian <[email protected]>
Renamed the projects to follow the TestingFramework naming convention:
Both now use |
@copilot Help me to understand the output of the benchmark class.
|
| Method | Mean | Error | StdDev | Gen0 | Gen1 | Gen2 | Allocated |
|---|---|---|---|---|---|---|---|
| 'Publish (no consumer)' | 14.84 ns | 0.461 ns | 0.071 ns | - | - | - | - |
| CreateSnapshot | 7,128.74 ns | 984.564 ns | 255.688 ns | 0.5264 | - | - | 3824 B |
| 'Publish + Read' | 20.30 ns | 0.997 ns | 0.154 ns | - | - | - | - |
| 'Full Producer Path' | 13,038.81 ns | 1,357.961 ns | 352.658 ns | 0.5341 | 0.2747 | 0.0153 | 3824 B |
| 'ToMutable (allocation)' | 5,378.22 ns | 402.030 ns | 104.406 ns | 1.4114 | 0.0229 | - | 10184 B |
// * Hints *
Outliers
MulticastRingBufferBenchmark.'Publish (no consumer)': IterationCount=5, LaunchCount=1, WarmupCount=3 -> 1 outlier was removed (17.00 ns)
MulticastRingBufferBenchmark.'Publish + Read': IterationCount=5, LaunchCount=1, WarmupCount=3 -> 1 outlier was removed (22.50 ns)
// * Legends *
Mean : Arithmetic mean of all measurements
Error : Half of 99.9% confidence interval
StdDev : Standard deviation of all measurements
Gen0 : GC Generation 0 collects per 1000 operations
Gen1 : GC Generation 1 collects per 1000 operations
Gen2 : GC Generation 2 collects per 1000 operations
Allocated : Allocated memory per single operation (managed only, inclusive, 1KB = 1024B)
1 ns : 1 Nanosecond (0.000000001 sec)
// * Diagnostic Output - MemoryDiagnoser *
// ***** BenchmarkRunner: End *****
Run time: 00:00:58 (58.59 sec), executed benchmarks: 5
Global total time: 00:01:12 (72.81 sec), executed benchmarks: 5
// * Artifacts cleanup *
Artifacts cleanup is finished
Running throughput benchmarks...
// Validating benchmarks:
// ***** BenchmarkRunner: Start *****
// ***** Found 2 benchmark(s) in total *****
// ***** Building 1 exe(s) in Parallel: Start *****
// start dotnet restore /p:UseSharedCompilation=false /p:BuildInParallel=false /m:1 /p:Deterministic=true /p:Optimize=true /p:IntermediateOutputPath="C:\MyFiles\Development\VisualHFT\VisualHFT.Commons.Benchmarks\bin\Release\net8.0-windows8.0\98084cc6-3ca9-481f-b1e2-74c78855987b\obj\Release\net8.0-Windows8.0/" /p:OutDir="C:\MyFiles\Development\VisualHFT\VisualHFT.Commons.Benchmarks\bin\Release\net8.0-windows8.0\98084cc6-3ca9-481f-b1e2-74c78855987b\bin\Release\net8.0-Windows8.0/" /p:OutputPath="C:\MyFiles\Development\VisualHFT\VisualHFT.Commons.Benchmarks\bin\Release\net8.0-windows8.0\98084cc6-3ca9-481f-b1e2-74c78855987b\bin\Release\net8.0-Windows8.0/" in C:\MyFiles\Development\VisualHFT\VisualHFT.Commons.Benchmarks\bin\Release\net8.0-windows8.0\98084cc6-3ca9-481f-b1e2-74c78855987b
// command took 1.77 sec and exited with 0
// start dotnet build -c Release --no-restore /p:UseSharedCompilation=false /p:BuildInParallel=false /m:1 /p:Deterministic=true /p:Optimize=true /p:IntermediateOutputPath="C:\MyFiles\Development\VisualHFT\VisualHFT.Commons.Benchmarks\bin\Release\net8.0-windows8.0\98084cc6-3ca9-481f-b1e2-74c78855987b\obj\Release\net8.0-Windows8.0/" /p:OutDir="C:\MyFiles\Development\VisualHFT\VisualHFT.Commons.Benchmarks\bin\Release\net8.0-windows8.0\98084cc6-3ca9-481f-b1e2-74c78855987b\bin\Release\net8.0-Windows8.0/" /p:OutputPath="C:\MyFiles\Development\VisualHFT\VisualHFT.Commons.Benchmarks\bin\Release\net8.0-windows8.0\98084cc6-3ca9-481f-b1e2-74c78855987b\bin\Release\net8.0-Windows8.0/" --output "C:\MyFiles\Development\VisualHFT\VisualHFT.Commons.Benchmarks\bin\Release\net8.0-windows8.0\98084cc6-3ca9-481f-b1e2-74c78855987b\bin\Release\net8.0-Windows8.0/" in C:\MyFiles\Development\VisualHFT\VisualHFT.Commons.Benchmarks\bin\Release\net8.0-windows8.0\98084cc6-3ca9-481f-b1e2-74c78855987b
// command took 10.39 sec and exited with 0
// ***** Done, took 00:00:12 (12.43 sec) *****
// Found 2 benchmarks:
// ThroughputBenchmark.'1M Publishes': Job-FCWKXR(IterationCount=3, LaunchCount=1, WarmupCount=1)
// ThroughputBenchmark.'1M Full Paths': Job-FCWKXR(IterationCount=3, LaunchCount=1, WarmupCount=1)
Setup power plan (GUID: 8c5e7fda-e8bf-4a96-9a85-a6e23a8c635c FriendlyName: High performance)
// **************************
// Benchmark: ThroughputBenchmark.'1M Publishes': Job-FCWKXR(IterationCount=3, LaunchCount=1, WarmupCount=1)
// *** Execute ***
// Launch: 1 / 1
// Execute: dotnet 98084cc6-3ca9-481f-b1e2-74c78855987b.dll --anonymousPipes 4648 3624 --benchmarkName VisualHFT.Benchmarks.ThroughputBenchmark.PublishOneMillion --job "IterationCount=3, LaunchCount=1, WarmupCount=1" --benchmarkId 0 in C:\MyFiles\Development\VisualHFT\VisualHFT.Commons.Benchmarks\bin\Release\net8.0-windows8.0\98084cc6-3ca9-481f-b1e2-74c78855987b\bin\Release\net8.0-Windows8.0
// BeforeAnythingElse
// Benchmark Process Environment Information:
// BenchmarkDotNet v0.14.0
// Runtime=.NET 8.0.22 (8.0.2225.52707), X64 RyuJIT AVX-512F+CD+BW+DQ+VL
// GC=Concurrent Workstation
// HardwareIntrinsics=AVX-512F+CD+BW+DQ+VL,AES,BMI1,BMI2,FMA,LZCNT,PCLMUL,POPCNT VectorSize=256
// Job: Job-SLOGNA(IterationCount=3, LaunchCount=1, WarmupCount=1)
OverheadJitting 1: 1 op, 246400.00 ns, 246.4000 us/op
WorkloadJitting 1: 1 op, 15135200.00 ns, 15.1352 ms/op
OverheadJitting 2: 16 op, 461700.00 ns, 28.8563 us/op
WorkloadJitting 2: 16 op, 230193100.00 ns, 14.3871 ms/op
WorkloadPilot 1: 16 op, 231672000.00 ns, 14.4795 ms/op
WorkloadPilot 2: 32 op, 458769500.00 ns, 14.3365 ms/op
WorkloadPilot 3: 64 op, 916493600.00 ns, 14.3202 ms/op
OverheadWarmup 1: 64 op, 3200.00 ns, 50.0000 ns/op
OverheadWarmup 2: 64 op, 1800.00 ns, 28.1250 ns/op
OverheadWarmup 3: 64 op, 1700.00 ns, 26.5625 ns/op
OverheadWarmup 4: 64 op, 1300.00 ns, 20.3125 ns/op
OverheadWarmup 5: 64 op, 1100.00 ns, 17.1875 ns/op
OverheadWarmup 6: 64 op, 2300.00 ns, 35.9375 ns/op
OverheadWarmup 7: 64 op, 1500.00 ns, 23.4375 ns/op
OverheadWarmup 8: 64 op, 1800.00 ns, 28.1250 ns/op
OverheadWarmup 9: 64 op, 1000.00 ns, 15.6250 ns/op
OverheadActual 1: 64 op, 1200.00 ns, 18.7500 ns/op
OverheadActual 2: 64 op, 1400.00 ns, 21.8750 ns/op
OverheadActual 3: 64 op, 1200.00 ns, 18.7500 ns/op
OverheadActual 4: 64 op, 1500.00 ns, 23.4375 ns/op
OverheadActual 5: 64 op, 1000.00 ns, 15.6250 ns/op
OverheadActual 6: 64 op, 1600.00 ns, 25.0000 ns/op
OverheadActual 7: 64 op, 1700.00 ns, 26.5625 ns/op
OverheadActual 8: 64 op, 1800.00 ns, 28.1250 ns/op
OverheadActual 9: 64 op, 1700.00 ns, 26.5625 ns/op
OverheadActual 10: 64 op, 1100.00 ns, 17.1875 ns/op
OverheadActual 11: 64 op, 1600.00 ns, 25.0000 ns/op
OverheadActual 12: 64 op, 1400.00 ns, 21.8750 ns/op
OverheadActual 13: 64 op, 1300.00 ns, 20.3125 ns/op
OverheadActual 14: 64 op, 1100.00 ns, 17.1875 ns/op
OverheadActual 15: 64 op, 1500.00 ns, 23.4375 ns/op
OverheadActual 16: 64 op, 1300.00 ns, 20.3125 ns/op
OverheadActual 17: 64 op, 4600.00 ns, 71.8750 ns/op
OverheadActual 18: 64 op, 2500.00 ns, 39.0625 ns/op
OverheadActual 19: 64 op, 1200.00 ns, 18.7500 ns/op
OverheadActual 20: 64 op, 1200.00 ns, 18.7500 ns/op
WorkloadWarmup 1: 64 op, 921101300.00 ns, 14.3922 ms/op
// BeforeActualRun
WorkloadActual 1: 64 op, 920911700.00 ns, 14.3892 ms/op
WorkloadActual 2: 64 op, 936855000.00 ns, 14.6384 ms/op
WorkloadActual 3: 64 op, 929671500.00 ns, 14.5261 ms/op
// AfterActualRun
WorkloadResult 1: 64 op, 920910300.00 ns, 14.3892 ms/op
WorkloadResult 2: 64 op, 936853600.00 ns, 14.6383 ms/op
WorkloadResult 3: 64 op, 929670100.00 ns, 14.5261 ms/op
// GC: 0 0 0 400 64
// Threading: 0 0 64
// AfterAll
// Benchmark Process 162352 has exited with code 0.
Mean = 14.518 ms, StdErr = 0.072 ms (0.50%), N = 3, StdDev = 0.125 ms
Min = 14.389 ms, Q1 = 14.458 ms, Median = 14.526 ms, Q3 = 14.582 ms, Max = 14.638 ms
IQR = 0.125 ms, LowerFence = 14.271 ms, UpperFence = 14.769 ms
ConfidenceInterval = [12.242 ms; 16.794 ms] (CI 99.9%), Margin = 2.276 ms (15.68% of Mean)
Skewness = -0.07, Kurtosis = 0.67, MValue = 2
// ** Remained 1 (50.0%) benchmark(s) to run. Estimated finish 2025-12-01 22:30 (0h 0m from now) **
Setup power plan (GUID: 8c5e7fda-e8bf-4a96-9a85-a6e23a8c635c FriendlyName: High performance)
// **************************
// Benchmark: ThroughputBenchmark.'1M Full Paths': Job-FCWKXR(IterationCount=3, LaunchCount=1, WarmupCount=1)
// *** Execute ***
// Launch: 1 / 1
// Execute: dotnet 98084cc6-3ca9-481f-b1e2-74c78855987b.dll --anonymousPipes 5376 1820 --benchmarkName VisualHFT.Benchmarks.ThroughputBenchmark.FullPathOneMillion --job "IterationCount=3, LaunchCount=1, WarmupCount=1" --benchmarkId 1 in C:\MyFiles\Development\VisualHFT\VisualHFT.Commons.Benchmarks\bin\Release\net8.0-windows8.0\98084cc6-3ca9-481f-b1e2-74c78855987b\bin\Release\net8.0-Windows8.0
// BeforeAnythingElse
// Benchmark Process Environment Information:
// BenchmarkDotNet v0.14.0
// Runtime=.NET 8.0.22 (8.0.2225.52707), X64 RyuJIT AVX-512F+CD+BW+DQ+VL
// GC=Concurrent Workstation
// HardwareIntrinsics=AVX-512F+CD+BW+DQ+VL,AES,BMI1,BMI2,FMA,LZCNT,PCLMUL,POPCNT VectorSize=256
// Job: Job-ZNDOJS(IterationCount=3, LaunchCount=1, WarmupCount=1)
OverheadJitting 1: 1 op, 230200.00 ns, 230.2000 us/op
WorkloadJitting 1: 1 op, 6883376200.00 ns, 6.8834 s/op
OverheadJitting 2: 1 op, 1000.00 ns, 1.0000 us/op
WorkloadJitting 2: 1 op, 6999636900.00 ns, 6.9996 s/op
WorkloadWarmup 1: 1 op, 6717387100.00 ns, 6.7174 s/op
// BeforeActualRun
WorkloadActual 1: 1 op, 6740579900.00 ns, 6.7406 s/op
WorkloadActual 2: 1 op, 6632871700.00 ns, 6.6329 s/op
WorkloadActual 3: 1 op, 6565654400.00 ns, 6.5657 s/op
// AfterActualRun
WorkloadResult 1: 1 op, 6740579900.00 ns, 6.7406 s/op
WorkloadResult 2: 1 op, 6632871700.00 ns, 6.6329 s/op
WorkloadResult 3: 1 op, 6565654400.00 ns, 6.5657 s/op
// GC: 297 149 1 2128000776 1
// Threading: 0 0 1
// AfterAll
// Benchmark Process 155444 has exited with code 0.
Mean = 6.646 s, StdErr = 0.051 s (0.77%), N = 3, StdDev = 0.088 s
Min = 6.566 s, Q1 = 6.599 s, Median = 6.633 s, Q3 = 6.687 s, Max = 6.741 s
IQR = 0.087 s, LowerFence = 6.468 s, UpperFence = 6.818 s
ConfidenceInterval = [5.037 s; 8.256 s] (CI 99.9%), Margin = 1.610 s (24.22% of Mean)
Skewness = 0.15, Kurtosis = 0.67, MValue = 2
// ** Remained 0 (0.0%) benchmark(s) to run. Estimated finish 2025-12-01 22:31 (0h 0m from now) **
Successfully reverted power plan (GUID: 8c5e7fda-e8bf-4a96-9a85-a6e23a8c635c FriendlyName: High performance)
// ***** BenchmarkRunner: Finish *****
// * Export *
BenchmarkDotNet.Artifacts\results\VisualHFT.Benchmarks.ThroughputBenchmark-report.csv
BenchmarkDotNet.Artifacts\results\VisualHFT.Benchmarks.ThroughputBenchmark-report-github.md
BenchmarkDotNet.Artifacts\results\VisualHFT.Benchmarks.ThroughputBenchmark-report.html
// * Detailed results *
ThroughputBenchmark.'1M Publishes': Job-FCWKXR(IterationCount=3, LaunchCount=1, WarmupCount=1)
Runtime = .NET 8.0.22 (8.0.2225.52707), X64 RyuJIT AVX-512F+CD+BW+DQ+VL; GC = Concurrent Workstation
Mean = 14.518 ms, StdErr = 0.072 ms (0.50%), N = 3, StdDev = 0.125 ms
Min = 14.389 ms, Q1 = 14.458 ms, Median = 14.526 ms, Q3 = 14.582 ms, Max = 14.638 ms
IQR = 0.125 ms, LowerFence = 14.271 ms, UpperFence = 14.769 ms
ConfidenceInterval = [12.242 ms; 16.794 ms] (CI 99.9%), Margin = 2.276 ms (15.68% of Mean)
Skewness = -0.07, Kurtosis = 0.67, MValue = 2
-------------------- Histogram --------------------
[14.276 ms ; 14.728 ms) | @@@
ThroughputBenchmark.'1M Full Paths': Job-FCWKXR(IterationCount=3, LaunchCount=1, WarmupCount=1)
Runtime = .NET 8.0.22 (8.0.2225.52707), X64 RyuJIT AVX-512F+CD+BW+DQ+VL; GC = Concurrent Workstation
Mean = 6.646 s, StdErr = 0.051 s (0.77%), N = 3, StdDev = 0.088 s
Min = 6.566 s, Q1 = 6.599 s, Median = 6.633 s, Q3 = 6.687 s, Max = 6.741 s
IQR = 0.087 s, LowerFence = 6.468 s, UpperFence = 6.818 s
ConfidenceInterval = [5.037 s; 8.256 s] (CI 99.9%), Margin = 1.610 s (24.22% of Mean)
Skewness = 0.15, Kurtosis = 0.67, MValue = 2
-------------------- Histogram --------------------
[6.519 s ; 6.680 s) | @@
[6.680 s ; 6.821 s) | @
// * Summary *
BenchmarkDotNet v0.14.0, Windows 11 (10.0.26220.7271)
Intel Core i9-7900X CPU 3.30GHz (Kaby Lake), 1 CPU, 20 logical and 10 physical cores
.NET SDK 10.0.100
[Host] : .NET 8.0.22 (8.0.2225.52707), X64 RyuJIT AVX-512F+CD+BW+DQ+VL
Job-FCWKXR : .NET 8.0.22 (8.0.2225.52707), X64 RyuJIT AVX-512F+CD+BW+DQ+VL
IterationCount=3 LaunchCount=1 WarmupCount=1
| Method | Mean | Error | StdDev | Gen0 | Gen1 | Gen2 | Allocated |
|---|---|---|---|---|---|---|---|
| '1M Publishes' | 14.52 ms | 2.276 ms | 0.125 ms | - | - | - | 6 B |
| '1M Full Paths' | 6,646.37 ms | 1,609.832 ms | 88.240 ms | 297000.0000 | 149000.0000 | 1000.0000 | 2128000776 B |
// * Legends *
Mean : Arithmetic mean of all measurements
Error : Half of 99.9% confidence interval
StdDev : Standard deviation of all measurements
Gen0 : GC Generation 0 collects per 1000 operations
Gen1 : GC Generation 1 collects per 1000 operations
Gen2 : GC Generation 2 collects per 1000 operations
Allocated : Allocated memory per single operation (managed only, inclusive, 1KB = 1024B)
1 ms : 1 Millisecond (0.001 sec)
// * Diagnostic Output - MemoryDiagnoser *
// ***** BenchmarkRunner: End *****
Run time: 00:01:16 (76.39 sec), executed benchmarks: 2
Global total time: 00:01:28 (88.89 sec), executed benchmarks: 2
// * Artifacts cleanup *
Artifacts cleanup is finished
C:\MyFiles\Development\VisualHFT\VisualHFT.Commons.Benchmarks\bin\Release\net8.0-windows8.0\VisualHFT.Commons.Benchmarks.exe (process 157652) exited with code 0 (0x0).
Press any key to close this window . . .
The output you're seeing is BenchmarkDotNet's build phase - it compiles an optimized version of the benchmark code before running. After the build completes, you'll see a results table like this: Key columns:
Benchmarks explained:
The benchmarks run twice: first |
|
@copilot Can you summarize for me the key takeaways of these results? |
I'd be happy to summarize the results! However, the output you shared earlier was just the build phase. Could you share the final results table that appears after the benchmarks complete? It should look something like: Once you share the actual numbers, I can provide insights on:
|
|
@copilot Here is the results: VisualHFT Multicast Ring Buffer BenchmarksRunning latency benchmarks... Setup power plan (GUID: 8c5e7fda-e8bf-4a96-9a85-a6e23a8c635c FriendlyName: High performance) // Benchmark Process Environment Information: OverheadJitting 1: 1 op, 244500.00 ns, 244.5000 us/op OverheadJitting 2: 16 op, 351200.00 ns, 21.9500 us/op WorkloadPilot 1: 16 op, 1300.00 ns, 81.2500 ns/op OverheadWarmup 1: 33554432 op, 72606500.00 ns, 2.1638 ns/op OverheadActual 1: 33554432 op, 55535200.00 ns, 1.6551 ns/op WorkloadWarmup 1: 33554432 op, 562743000.00 ns, 16.7710 ns/op // BeforeActualRun // AfterActualRun // AfterAll Mean = 14.840 ns, StdErr = 0.036 ns (0.24%), N = 4, StdDev = 0.071 ns // ** Remained 4 (80.0%) benchmark(s) to run. Estimated finish 2025-12-01 22:29 (0h 0m from now) ** // Benchmark Process Environment Information: OverheadJitting 1: 1 op, 265900.00 ns, 265.9000 us/op OverheadJitting 2: 16 op, 517000.00 ns, 32.3125 us/op WorkloadPilot 1: 16 op, 310400.00 ns, 19.4000 us/op OverheadWarmup 1: 131072 op, 595000.00 ns, 4.5395 ns/op OverheadActual 1: 131072 op, 490100.00 ns, 3.7392 ns/op WorkloadWarmup 1: 131072 op, 848231300.00 ns, 6.4715 us/op // BeforeActualRun // AfterActualRun // AfterAll Mean = 7.129 us, StdErr = 0.114 us (1.60%), N = 5, StdDev = 0.256 us // ** Remained 3 (60.0%) benchmark(s) to run. Estimated finish 2025-12-01 22:30 (0h 0m from now) ** // Benchmark Process Environment Information: OverheadJitting 1: 1 op, 319300.00 ns, 319.3000 us/op OverheadJitting 2: 16 op, 390300.00 ns, 24.3938 us/op WorkloadPilot 1: 16 op, 2100.00 ns, 131.2500 ns/op OverheadWarmup 1: 33554432 op, 111859600.00 ns, 3.3337 ns/op OverheadActual 1: 33554432 op, 54837300.00 ns, 1.6343 ns/op WorkloadWarmup 1: 33554432 op, 741315500.00 ns, 22.0929 ns/op // BeforeActualRun // AfterActualRun // AfterAll Mean = 20.302 ns, StdErr = 0.077 ns (0.38%), N = 4, StdDev = 0.154 ns // ** Remained 2 (40.0%) benchmark(s) to run. Estimated finish 2025-12-01 22:30 (0h 0m from now) ** // Benchmark Process Environment Information: OverheadJitting 1: 1 op, 262000.00 ns, 262.0000 us/op OverheadJitting 2: 16 op, 413600.00 ns, 25.8500 us/op WorkloadPilot 1: 16 op, 321800.00 ns, 20.1125 us/op OverheadWarmup 1: 65536 op, 155700.00 ns, 2.3758 ns/op OverheadActual 1: 65536 op, 108600.00 ns, 1.6571 ns/op WorkloadWarmup 1: 65536 op, 876358700.00 ns, 13.3722 us/op // BeforeActualRun // AfterActualRun // AfterAll Mean = 13.039 us, StdErr = 0.158 us (1.21%), N = 5, StdDev = 0.353 us // ** Remained 1 (20.0%) benchmark(s) to run. Estimated finish 2025-12-01 22:30 (0h 0m from now) ** // Benchmark Process Environment Information: OverheadJitting 1: 1 op, 332300.00 ns, 332.3000 us/op OverheadJitting 2: 16 op, 496300.00 ns, 31.0188 us/op WorkloadPilot 1: 16 op, 216300.00 ns, 13.5188 us/op OverheadWarmup 1: 131072 op, 489700.00 ns, 3.7361 ns/op OverheadActual 1: 131072 op, 471000.00 ns, 3.5934 ns/op WorkloadWarmup 1: 131072 op, 636290500.00 ns, 4.8545 us/op // BeforeActualRun // AfterActualRun // AfterAll Mean = 5.378 us, StdErr = 0.047 us (0.87%), N = 5, StdDev = 0.104 us // ** Remained 0 (0.0%) benchmark(s) to run. Estimated finish 2025-12-01 22:30 (0h 0m from now) ** // * Export * // * Detailed results *
|
| Method | Mean | Error | StdDev | Gen0 | Gen1 | Gen2 | Allocated |
|---|---|---|---|---|---|---|---|
| 'Publish (no consumer)' | 14.84 ns | 0.461 ns | 0.071 ns | - | - | - | - |
| CreateSnapshot | 7,128.74 ns | 984.564 ns | 255.688 ns | 0.5264 | - | - | 3824 B |
| 'Publish + Read' | 20.30 ns | 0.997 ns | 0.154 ns | - | - | - | - |
| 'Full Producer Path' | 13,038.81 ns | 1,357.961 ns | 352.658 ns | 0.5341 | 0.2747 | 0.0153 | 3824 B |
| 'ToMutable (allocation)' | 5,378.22 ns | 402.030 ns | 104.406 ns | 1.4114 | 0.0229 | - | 10184 B |
// * Hints *
Outliers
MulticastRingBufferBenchmark.'Publish (no consumer)': IterationCount=5, LaunchCount=1, WarmupCount=3 -> 1 outlier was removed (17.00 ns)
MulticastRingBufferBenchmark.'Publish + Read': IterationCount=5, LaunchCount=1, WarmupCount=3 -> 1 outlier was removed (22.50 ns)
// * Legends *
Mean : Arithmetic mean of all measurements
Error : Half of 99.9% confidence interval
StdDev : Standard deviation of all measurements
Gen0 : GC Generation 0 collects per 1000 operations
Gen1 : GC Generation 1 collects per 1000 operations
Gen2 : GC Generation 2 collects per 1000 operations
Allocated : Allocated memory per single operation (managed only, inclusive, 1KB = 1024B)
1 ns : 1 Nanosecond (0.000000001 sec)
// * Diagnostic Output - MemoryDiagnoser *
// ***** BenchmarkRunner: End *****
Run time: 00:00:58 (58.59 sec), executed benchmarks: 5
Global total time: 00:01:12 (72.81 sec), executed benchmarks: 5
// * Artifacts cleanup *
Artifacts cleanup is finished
Running throughput benchmarks...
// Validating benchmarks:
// ***** BenchmarkRunner: Start *****
// ***** Found 2 benchmark(s) in total *****
// ***** Building 1 exe(s) in Parallel: Start *****
// start dotnet restore /p:UseSharedCompilation=false /p:BuildInParallel=false /m:1 /p:Deterministic=true /p:Optimize=true /p:IntermediateOutputPath="C:\MyFiles\Development\VisualHFT\VisualHFT.Commons.Benchmarks\bin\Release\net8.0-windows8.0\98084cc6-3ca9-481f-b1e2-74c78855987b\obj\Release\net8.0-Windows8.0/" /p:OutDir="C:\MyFiles\Development\VisualHFT\VisualHFT.Commons.Benchmarks\bin\Release\net8.0-windows8.0\98084cc6-3ca9-481f-b1e2-74c78855987b\bin\Release\net8.0-Windows8.0/" /p:OutputPath="C:\MyFiles\Development\VisualHFT\VisualHFT.Commons.Benchmarks\bin\Release\net8.0-windows8.0\98084cc6-3ca9-481f-b1e2-74c78855987b\bin\Release\net8.0-Windows8.0/" in C:\MyFiles\Development\VisualHFT\VisualHFT.Commons.Benchmarks\bin\Release\net8.0-windows8.0\98084cc6-3ca9-481f-b1e2-74c78855987b
// command took 1.77 sec and exited with 0
// start dotnet build -c Release --no-restore /p:UseSharedCompilation=false /p:BuildInParallel=false /m:1 /p:Deterministic=true /p:Optimize=true /p:IntermediateOutputPath="C:\MyFiles\Development\VisualHFT\VisualHFT.Commons.Benchmarks\bin\Release\net8.0-windows8.0\98084cc6-3ca9-481f-b1e2-74c78855987b\obj\Release\net8.0-Windows8.0/" /p:OutDir="C:\MyFiles\Development\VisualHFT\VisualHFT.Commons.Benchmarks\bin\Release\net8.0-windows8.0\98084cc6-3ca9-481f-b1e2-74c78855987b\bin\Release\net8.0-Windows8.0/" /p:OutputPath="C:\MyFiles\Development\VisualHFT\VisualHFT.Commons.Benchmarks\bin\Release\net8.0-windows8.0\98084cc6-3ca9-481f-b1e2-74c78855987b\bin\Release\net8.0-Windows8.0/" --output "C:\MyFiles\Development\VisualHFT\VisualHFT.Commons.Benchmarks\bin\Release\net8.0-windows8.0\98084cc6-3ca9-481f-b1e2-74c78855987b\bin\Release\net8.0-Windows8.0/" in C:\MyFiles\Development\VisualHFT\VisualHFT.Commons.Benchmarks\bin\Release\net8.0-windows8.0\98084cc6-3ca9-481f-b1e2-74c78855987b
// command took 10.39 sec and exited with 0
// ***** Done, took 00:00:12 (12.43 sec) *****
// Found 2 benchmarks:
// ThroughputBenchmark.'1M Publishes': Job-FCWKXR(IterationCount=3, LaunchCount=1, WarmupCount=1)
// ThroughputBenchmark.'1M Full Paths': Job-FCWKXR(IterationCount=3, LaunchCount=1, WarmupCount=1)
Setup power plan (GUID: 8c5e7fda-e8bf-4a96-9a85-a6e23a8c635c FriendlyName: High performance)
// **************************
// Benchmark: ThroughputBenchmark.'1M Publishes': Job-FCWKXR(IterationCount=3, LaunchCount=1, WarmupCount=1)
// *** Execute ***
// Launch: 1 / 1
// Execute: dotnet 98084cc6-3ca9-481f-b1e2-74c78855987b.dll --anonymousPipes 4648 3624 --benchmarkName VisualHFT.Benchmarks.ThroughputBenchmark.PublishOneMillion --job "IterationCount=3, LaunchCount=1, WarmupCount=1" --benchmarkId 0 in C:\MyFiles\Development\VisualHFT\VisualHFT.Commons.Benchmarks\bin\Release\net8.0-windows8.0\98084cc6-3ca9-481f-b1e2-74c78855987b\bin\Release\net8.0-Windows8.0
// BeforeAnythingElse
// Benchmark Process Environment Information:
// BenchmarkDotNet v0.14.0
// Runtime=.NET 8.0.22 (8.0.2225.52707), X64 RyuJIT AVX-512F+CD+BW+DQ+VL
// GC=Concurrent Workstation
// HardwareIntrinsics=AVX-512F+CD+BW+DQ+VL,AES,BMI1,BMI2,FMA,LZCNT,PCLMUL,POPCNT VectorSize=256
// Job: Job-SLOGNA(IterationCount=3, LaunchCount=1, WarmupCount=1)
OverheadJitting 1: 1 op, 246400.00 ns, 246.4000 us/op
WorkloadJitting 1: 1 op, 15135200.00 ns, 15.1352 ms/op
OverheadJitting 2: 16 op, 461700.00 ns, 28.8563 us/op
WorkloadJitting 2: 16 op, 230193100.00 ns, 14.3871 ms/op
WorkloadPilot 1: 16 op, 231672000.00 ns, 14.4795 ms/op
WorkloadPilot 2: 32 op, 458769500.00 ns, 14.3365 ms/op
WorkloadPilot 3: 64 op, 916493600.00 ns, 14.3202 ms/op
OverheadWarmup 1: 64 op, 3200.00 ns, 50.0000 ns/op
OverheadWarmup 2: 64 op, 1800.00 ns, 28.1250 ns/op
OverheadWarmup 3: 64 op, 1700.00 ns, 26.5625 ns/op
OverheadWarmup 4: 64 op, 1300.00 ns, 20.3125 ns/op
OverheadWarmup 5: 64 op, 1100.00 ns, 17.1875 ns/op
OverheadWarmup 6: 64 op, 2300.00 ns, 35.9375 ns/op
OverheadWarmup 7: 64 op, 1500.00 ns, 23.4375 ns/op
OverheadWarmup 8: 64 op, 1800.00 ns, 28.1250 ns/op
OverheadWarmup 9: 64 op, 1000.00 ns, 15.6250 ns/op
OverheadActual 1: 64 op, 1200.00 ns, 18.7500 ns/op
OverheadActual 2: 64 op, 1400.00 ns, 21.8750 ns/op
OverheadActual 3: 64 op, 1200.00 ns, 18.7500 ns/op
OverheadActual 4: 64 op, 1500.00 ns, 23.4375 ns/op
OverheadActual 5: 64 op, 1000.00 ns, 15.6250 ns/op
OverheadActual 6: 64 op, 1600.00 ns, 25.0000 ns/op
OverheadActual 7: 64 op, 1700.00 ns, 26.5625 ns/op
OverheadActual 8: 64 op, 1800.00 ns, 28.1250 ns/op
OverheadActual 9: 64 op, 1700.00 ns, 26.5625 ns/op
OverheadActual 10: 64 op, 1100.00 ns, 17.1875 ns/op
OverheadActual 11: 64 op, 1600.00 ns, 25.0000 ns/op
OverheadActual 12: 64 op, 1400.00 ns, 21.8750 ns/op
OverheadActual 13: 64 op, 1300.00 ns, 20.3125 ns/op
OverheadActual 14: 64 op, 1100.00 ns, 17.1875 ns/op
OverheadActual 15: 64 op, 1500.00 ns, 23.4375 ns/op
OverheadActual 16: 64 op, 1300.00 ns, 20.3125 ns/op
OverheadActual 17: 64 op, 4600.00 ns, 71.8750 ns/op
OverheadActual 18: 64 op, 2500.00 ns, 39.0625 ns/op
OverheadActual 19: 64 op, 1200.00 ns, 18.7500 ns/op
OverheadActual 20: 64 op, 1200.00 ns, 18.7500 ns/op
WorkloadWarmup 1: 64 op, 921101300.00 ns, 14.3922 ms/op
// BeforeActualRun
WorkloadActual 1: 64 op, 920911700.00 ns, 14.3892 ms/op
WorkloadActual 2: 64 op, 936855000.00 ns, 14.6384 ms/op
WorkloadActual 3: 64 op, 929671500.00 ns, 14.5261 ms/op
// AfterActualRun
WorkloadResult 1: 64 op, 920910300.00 ns, 14.3892 ms/op
WorkloadResult 2: 64 op, 936853600.00 ns, 14.6383 ms/op
WorkloadResult 3: 64 op, 929670100.00 ns, 14.5261 ms/op
// GC: 0 0 0 400 64
// Threading: 0 0 64
// AfterAll
// Benchmark Process 162352 has exited with code 0.
Mean = 14.518 ms, StdErr = 0.072 ms (0.50%), N = 3, StdDev = 0.125 ms
Min = 14.389 ms, Q1 = 14.458 ms, Median = 14.526 ms, Q3 = 14.582 ms, Max = 14.638 ms
IQR = 0.125 ms, LowerFence = 14.271 ms, UpperFence = 14.769 ms
ConfidenceInterval = [12.242 ms; 16.794 ms] (CI 99.9%), Margin = 2.276 ms (15.68% of Mean)
Skewness = -0.07, Kurtosis = 0.67, MValue = 2
// ** Remained 1 (50.0%) benchmark(s) to run. Estimated finish 2025-12-01 22:30 (0h 0m from now) **
Setup power plan (GUID: 8c5e7fda-e8bf-4a96-9a85-a6e23a8c635c FriendlyName: High performance)
// **************************
// Benchmark: ThroughputBenchmark.'1M Full Paths': Job-FCWKXR(IterationCount=3, LaunchCount=1, WarmupCount=1)
// *** Execute ***
// Launch: 1 / 1
// Execute: dotnet 98084cc6-3ca9-481f-b1e2-74c78855987b.dll --anonymousPipes 5376 1820 --benchmarkName VisualHFT.Benchmarks.ThroughputBenchmark.FullPathOneMillion --job "IterationCount=3, LaunchCount=1, WarmupCount=1" --benchmarkId 1 in C:\MyFiles\Development\VisualHFT\VisualHFT.Commons.Benchmarks\bin\Release\net8.0-windows8.0\98084cc6-3ca9-481f-b1e2-74c78855987b\bin\Release\net8.0-Windows8.0
// BeforeAnythingElse
// Benchmark Process Environment Information:
// BenchmarkDotNet v0.14.0
// Runtime=.NET 8.0.22 (8.0.2225.52707), X64 RyuJIT AVX-512F+CD+BW+DQ+VL
// GC=Concurrent Workstation
// HardwareIntrinsics=AVX-512F+CD+BW+DQ+VL,AES,BMI1,BMI2,FMA,LZCNT,PCLMUL,POPCNT VectorSize=256
// Job: Job-ZNDOJS(IterationCount=3, LaunchCount=1, WarmupCount=1)
OverheadJitting 1: 1 op, 230200.00 ns, 230.2000 us/op
WorkloadJitting 1: 1 op, 6883376200.00 ns, 6.8834 s/op
OverheadJitting 2: 1 op, 1000.00 ns, 1.0000 us/op
WorkloadJitting 2: 1 op, 6999636900.00 ns, 6.9996 s/op
WorkloadWarmup 1: 1 op, 6717387100.00 ns, 6.7174 s/op
// BeforeActualRun
WorkloadActual 1: 1 op, 6740579900.00 ns, 6.7406 s/op
WorkloadActual 2: 1 op, 6632871700.00 ns, 6.6329 s/op
WorkloadActual 3: 1 op, 6565654400.00 ns, 6.5657 s/op
// AfterActualRun
WorkloadResult 1: 1 op, 6740579900.00 ns, 6.7406 s/op
WorkloadResult 2: 1 op, 6632871700.00 ns, 6.6329 s/op
WorkloadResult 3: 1 op, 6565654400.00 ns, 6.5657 s/op
// GC: 297 149 1 2128000776 1
// Threading: 0 0 1
// AfterAll
// Benchmark Process 155444 has exited with code 0.
Mean = 6.646 s, StdErr = 0.051 s (0.77%), N = 3, StdDev = 0.088 s
Min = 6.566 s, Q1 = 6.599 s, Median = 6.633 s, Q3 = 6.687 s, Max = 6.741 s
IQR = 0.087 s, LowerFence = 6.468 s, UpperFence = 6.818 s
ConfidenceInterval = [5.037 s; 8.256 s] (CI 99.9%), Margin = 1.610 s (24.22% of Mean)
Skewness = 0.15, Kurtosis = 0.67, MValue = 2
// ** Remained 0 (0.0%) benchmark(s) to run. Estimated finish 2025-12-01 22:31 (0h 0m from now) **
Successfully reverted power plan (GUID: 8c5e7fda-e8bf-4a96-9a85-a6e23a8c635c FriendlyName: High performance)
// ***** BenchmarkRunner: Finish *****
// * Export *
BenchmarkDotNet.Artifacts\results\VisualHFT.Benchmarks.ThroughputBenchmark-report.csv
BenchmarkDotNet.Artifacts\results\VisualHFT.Benchmarks.ThroughputBenchmark-report-github.md
BenchmarkDotNet.Artifacts\results\VisualHFT.Benchmarks.ThroughputBenchmark-report.html
// * Detailed results *
ThroughputBenchmark.'1M Publishes': Job-FCWKXR(IterationCount=3, LaunchCount=1, WarmupCount=1)
Runtime = .NET 8.0.22 (8.0.2225.52707), X64 RyuJIT AVX-512F+CD+BW+DQ+VL; GC = Concurrent Workstation
Mean = 14.518 ms, StdErr = 0.072 ms (0.50%), N = 3, StdDev = 0.125 ms
Min = 14.389 ms, Q1 = 14.458 ms, Median = 14.526 ms, Q3 = 14.582 ms, Max = 14.638 ms
IQR = 0.125 ms, LowerFence = 14.271 ms, UpperFence = 14.769 ms
ConfidenceInterval = [12.242 ms; 16.794 ms] (CI 99.9%), Margin = 2.276 ms (15.68% of Mean)
Skewness = -0.07, Kurtosis = 0.67, MValue = 2
-------------------- Histogram --------------------
[14.276 ms ; 14.728 ms) | @@@
ThroughputBenchmark.'1M Full Paths': Job-FCWKXR(IterationCount=3, LaunchCount=1, WarmupCount=1)
Runtime = .NET 8.0.22 (8.0.2225.52707), X64 RyuJIT AVX-512F+CD+BW+DQ+VL; GC = Concurrent Workstation
Mean = 6.646 s, StdErr = 0.051 s (0.77%), N = 3, StdDev = 0.088 s
Min = 6.566 s, Q1 = 6.599 s, Median = 6.633 s, Q3 = 6.687 s, Max = 6.741 s
IQR = 0.087 s, LowerFence = 6.468 s, UpperFence = 6.818 s
ConfidenceInterval = [5.037 s; 8.256 s] (CI 99.9%), Margin = 1.610 s (24.22% of Mean)
Skewness = 0.15, Kurtosis = 0.67, MValue = 2
-------------------- Histogram --------------------
[6.519 s ; 6.680 s) | @@
[6.680 s ; 6.821 s) | @
// * Summary *
BenchmarkDotNet v0.14.0, Windows 11 (10.0.26220.7271)
Intel Core i9-7900X CPU 3.30GHz (Kaby Lake), 1 CPU, 20 logical and 10 physical cores
.NET SDK 10.0.100
[Host] : .NET 8.0.22 (8.0.2225.52707), X64 RyuJIT AVX-512F+CD+BW+DQ+VL
Job-FCWKXR : .NET 8.0.22 (8.0.2225.52707), X64 RyuJIT AVX-512F+CD+BW+DQ+VL
IterationCount=3 LaunchCount=1 WarmupCount=1
| Method | Mean | Error | StdDev | Gen0 | Gen1 | Gen2 | Allocated |
|---|---|---|---|---|---|---|---|
| '1M Publishes' | 14.52 ms | 2.276 ms | 0.125 ms | - | - | - | 6 B |
| '1M Full Paths' | 6,646.37 ms | 1,609.832 ms | 88.240 ms | 297000.0000 | 149000.0000 | 1000.0000 | 2128000776 B |
// * Legends *
Mean : Arithmetic mean of all measurements
Error : Half of 99.9% confidence interval
StdDev : Standard deviation of all measurements
Gen0 : GC Generation 0 collects per 1000 operations
Gen1 : GC Generation 1 collects per 1000 operations
Gen2 : GC Generation 2 collects per 1000 operations
Allocated : Allocated memory per single operation (managed only, inclusive, 1KB = 1024B)
1 ms : 1 Millisecond (0.001 sec)
// * Diagnostic Output - MemoryDiagnoser *
// ***** BenchmarkRunner: End *****
Run time: 00:01:16 (76.39 sec), executed benchmarks: 2
Global total time: 00:01:28 (88.89 sec), executed benchmarks: 2
// * Artifacts cleanup *
Artifacts cleanup is finished
The output appears to be cut off - I'm still only seeing the build phase. The actual benchmark results table should appear at the end after all the build output, looking like: Could you scroll down to find the results table, or let the benchmark run to completion? The build phase can take a few minutes. Once complete, you'll see a summary table with Mean, Error, StdDev, and Allocated columns showing the actual performance numbers. |
|
@copilot BenchmarkDotNet v0.14.0, Windows 11 (10.0.26220.7271) IterationCount=5 LaunchCount=1 WarmupCount=3
BenchmarkDotNet v0.14.0, Windows 11 (10.0.26220.7271) IterationCount=3 LaunchCount=1 WarmupCount=1
|
Key Takeaways from Your Benchmark Results: ✅ Excellent Results - Ring Buffer Core Operations:
Summary: Recommendation: Studies using the new |
|
@copilot in these benchmarks we are testing the new implementations... |
…s ring buffer Co-authored-by: silahian <[email protected]>
Added comparison benchmarks in commit 00ef1a0. The benchmarks now include: Latency Comparison:
Throughput Comparison (1M messages):
Run with
|
|
@copilot BenchmarkDotNet v0.14.0, Windows 11 (10.0.26220.7271) IterationCount=5 LaunchCount=1 WarmupCount=3
BenchmarkDotNet v0.14.0, Windows 11 (10.0.26220.7271) IterationCount=3 LaunchCount=1 WarmupCount=1
BenchmarkDotNet v0.14.0, Windows 11 (10.0.26220.7271) IterationCount=5 LaunchCount=1 WarmupCount=3
BenchmarkDotNet v0.14.0, Windows 11 (10.0.26220.7271) IterationCount=3 LaunchCount=1 WarmupCount=1
|
OLD vs NEW Comparison Summary: Latency (Single Operation):
Throughput (1M Messages):
Key Insights:
Bottom line: The ring buffer delivers ~20% raw performance gain, but the bigger win is true decoupling - slow consumers can't block the producer or other consumers. |
|
@copilot I did some changes to the MulticastRingBuffer and ImmutableOrderBook classes to improve performance by using object pools. BenchmarkDotNet v0.14.0, Windows 11 (10.0.26220.7271) IterationCount=5 LaunchCount=1 WarmupCount=3
// * Hints * BenchmarkDotNet v0.14.0, Windows 11 (10.0.26220.7271) IterationCount=3 LaunchCount=1 WarmupCount=1
BenchmarkDotNet v0.14.0, Windows 11 (10.0.26220.7271) IterationCount=5 LaunchCount=1 WarmupCount=3
// * Hints * BenchmarkDotNet v0.14.0, Windows 11 (10.0.26220.7271) IterationCount=3 LaunchCount=1 WarmupCount=1
|
Analysis of Your Object Pool Improvements: ✅ Memory Allocation - Significant Improvement:
Your object pooling is clearly working - the
The core publish/read operations are slower. This could be due to:
Net Assessment:
Could you share the throughput benchmark results (1M Publishes, 1M Full Paths) to see if the reduced allocations improved sustained performance? |
|
@copilot After the adjustments, these are the new results: BenchmarkDotNet v0.14.0, Windows 11 (10.0.26220.7271) IterationCount=5 LaunchCount=1 WarmupCount=3
// * Hints * BenchmarkDotNet v0.14.0, Windows 11 (10.0.26220.7271) IterationCount=3 LaunchCount=1 WarmupCount=1
BenchmarkDotNet v0.14.0, Windows 11 (10.0.26220.7271) IterationCount=5 LaunchCount=1 WarmupCount=3
// * Hints * BenchmarkDotNet v0.14.0, Windows 11 (10.0.26220.7271) IterationCount=3 LaunchCount=1 WarmupCount=1
|
Analysis of Latest Results:
All three NEW operations now allocate 1536 B and take ~8-9 μs. Previously:
This suggests the adjustments may have inadvertently moved snapshot creation into the hot path for all publish operations. The ring buffer's core advantage (lock-free, zero-allocation publish) appears to be lost. Expected behavior:
Recommendation: Review the recent changes to ensure |
|
@copilot I'm mostly worried about the "'OLD: Lock + Dispatch" vs the new way. IterationCount=5 LaunchCount=1 WarmupCount=3
// * Hints * BenchmarkDotNet v0.14.0, Windows 11 (10.0.26220.7271) IterationCount=3 LaunchCount=1 WarmupCount=1
BenchmarkDotNet v0.14.0, Windows 11 (10.0.26220.7271) IterationCount=5 LaunchCount=1 WarmupCount=3
// * Hints * BenchmarkDotNet v0.14.0, Windows 11 (10.0.26220.7271) IterationCount=3 LaunchCount=1 WarmupCount=1
|
Current State - Significant Regression:
The new implementation is currently ~563x slower than the old lock-based approach. This is because the changes moved Root cause: In my original implementation, What should happen:
To fix: Review the |
Core Messaging Components
VisualHFT.Commons/Messaging/directoryPaddedLong.cs- Cache-line aligned long for false sharing preventionConsumerMetrics.cs- Consumer statistics and lag monitoringMulticastRingBuffer.cs- Lock-free SPMC ring bufferImmutableOrderBook.cs- Zero-copy immutable wrapper for OrderBookImmutableTrade.cs- Lightweight immutable Trade wrapperObject Pools
VisualHFT.Commons/Helpers/ObjectPools/directoryOrderBookLevelArrayPool.cs- Pool for OrderBookLevel arraysGenericArrayPool.cs- Reusable generic array poolUpdate Existing Helpers
HelperOrderBook.cs- Replace synchronous dispatch with multicast ring bufferHelperTrade.cs- Replace synchronous dispatch with multicast ring bufferTesting
VisualHFT.Commons.TestingFramework/project (renamed from VisualHFT.Commons.Tests)MulticastRingBufferTests.csunit tests (66 tests passed)ImmutableOrderBookTests.csunit testsHelperOrderBookIntegrationTests.csintegration testsBenchmarks
VisualHFT.Commons.Benchmarks/project (renamed from VisualHFT.Benchmarks)HelperOrderBookBenchmark.csusing BenchmarkDotNetDocumentation
docs/MulticastRingBuffer-Architecture.mdREADME.mdwith performance improvementsCode Review & Security
Summary
This PR implements an ultra-high-performance multicast ring buffer architecture that achieves:
Original prompt
🎯 Objective
Replace the current synchronous, blocking data bus architecture with an ultra-low latency, lock-free multicast ring buffer that achieves:
📋 Requirements
1. Core Components to Create
A.
VisualHFT.Commons/Messaging/MulticastRingBuffer.csB.
VisualHFT.Commons/Messaging/ImmutableOrderBook.csC.
VisualHFT.Commons/Messaging/ImmutableTrade.csD.
VisualHFT.Commons/Helpers/ObjectPools/OrderBookLevelArrayPool.cs- Pool for OrderBookLevel arraysGenericArrayPool.cs- Reusable generic array pool2. Update Existing Components
A.
VisualHFT.Commons/Helpers/HelperOrderBook.csChanges:
List<Action<OrderBook>>withMulticastRingBuffer<ImmutableOrderBook>Subscribe(Action<OrderBook>)- Legacy API (backward compatible, allocates mutable copy)Subscribe(Action<ImmutableOrderBook>)- New API (zero-copy, optimal performance)UpdateData(OrderBook),Unsubscribe()B.
VisualHFT.Commons/Helpers/HelperTrade.csChanges:
List<Action<Trade>>withMulticastRingBuffer<ImmutableTrade>3. Architecture Details
Data Flow:
Key Design Principles:
4. Implementation Requirements
MulticastRingBuffer.cs:
ImmutableOrderBook.cs:
Updated HelperOrderBook.cs: