Skip to content

Commit 0021534

Browse files
CopilotT-Gro
andcommitted
Add DList and ImmutableArray implementations to QueueList benchmarks - V5 DList is 4.1x faster!
Added 3 new implementations: - V5: DList with lazy materialized list (cached iteration) - V6: DList with native iteration (no caching) - V7: ImmutableArray-backed Results for Combined Scenario (5000 appends): - V5 (DList Cached): 4.794ms (4.1x faster, 1.6x memory) - BEST CHOICE - V6 (DList Native): 4.864ms (4.0x faster, 1.46x memory) - V7 (ImmutableArray): 4.805ms (4.1x faster, 8.0x memory) - V3 (Array): 4.748ms (4.1x faster, 8.1x memory) V5 DList with cached iteration provides best speed/memory trade-off. Co-authored-by: T-Gro <[email protected]>
1 parent 755ef00 commit 0021534

File tree

2 files changed

+329
-15
lines changed

2 files changed

+329
-15
lines changed

investigation/QUEUELIST_BENCHMARK_RESULTS.md

Lines changed: 39 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -2,13 +2,16 @@
22

33
## Overview
44

5-
Created comprehensive BenchmarkDotNet benchmarks for QueueList to simulate the 5000-element append scenario as used in CheckDeclarations. Tested 5 implementations:
5+
Created comprehensive BenchmarkDotNet benchmarks for QueueList to simulate the 5000-element append scenario as used in CheckDeclarations. Tested 8 implementations:
66

77
- **Original**: Current baseline implementation
88
- **V1**: AppendOptimized (current commit's optimization)
99
- **V2**: Optimized for single-element appends
1010
- **V3**: Array-backed with preallocation
1111
- **V4**: ResizeArray-backed
12+
- **V5**: DList with lazy materialized list (cached iteration)
13+
- **V6**: DList with native iteration (no caching)
14+
- **V7**: ImmutableArray-backed
1215

1316
## Key Findings
1417

@@ -28,15 +31,22 @@ Created comprehensive BenchmarkDotNet benchmarks for QueueList to simulate the 5
2831

2932
This is closest to real CheckDeclarations usage:
3033

31-
| Implementation | Mean (ms) | Ratio | Allocated |
32-
|----------------|-----------|-------|--------------|
33-
| V3 (Array) | 4.718 | 0.24 | 50.81 MB |
34-
| V4 (ResizeArray) | 13.911 | 0.70 | 150.50 MB |
35-
| V1 (Current) | 19.560 | 0.98 | 1.84 MB |
36-
| V2 (Optimized) | 19.708 | 0.99 | 1.84 MB |
37-
| Original | 19.891 | 1.00 | N/A |
38-
39-
**Key Insight**: V1/V2 perform nearly identically (~1% difference, within margin of error). Array-based V3 is **4.2x faster** but allocates **27x more memory**.
34+
| Implementation | Mean (ms) | Ratio | Allocated | Alloc Ratio |
35+
|----------------|-----------|-------|-----------|-------------|
36+
| V3 (Array) | 4.748 | 0.24 | 48.46 MB | 8.14 |
37+
| **V5 (DList Cached)** | **4.794** | **0.24** | **9.61 MB** | **1.61** |
38+
| V7 (ImmutableArray) | 4.805 | 0.24 | 47.93 MB | 8.05 |
39+
| V6 (DList Native) | 4.864 | 0.25 | 8.69 MB | 1.46 |
40+
| V4 (ResizeArray) | 14.498 | 0.74 | 143.53 MB | 24.10 |
41+
| V1 (Current) | 19.490 | 0.99 | 1.75 MB | 0.29 |
42+
| V2 (Optimized) | 19.518 | 0.99 | 1.75 MB | 0.29 |
43+
| Original | 19.702 | 1.00 | 5.96 MB | 1.00 |
44+
45+
**Key Insights**:
46+
- **V5 (DList with lazy cached list) is the WINNER**: **4.1x faster** than baseline with only **1.6x more memory** (best speed/memory trade-off)
47+
- V6 (DList native) is slightly slower but uses even less memory (1.46x)
48+
- V3/V7 (array-based) are equally fast but use 8x more memory
49+
- V1/V2 perform nearly identically (~1% difference, within margin of error)
4050

4151
## Analysis
4252

@@ -46,18 +56,32 @@ This is closest to real CheckDeclarations usage:
4656
2. **AppendOptimized overhead**: Creating intermediate merged lists has cost without benefit for single-element case
4757
3. **No structural sharing**: Each operation creates new objects, so optimization can't amortize
4858

49-
### Why V3 (Array) is Fastest
59+
### Why V5 (DList with Caching) is Best
60+
61+
1. **O(1) append**: DList composition is constant time
62+
2. **Lazy materialization**: List is only computed when needed for iteration
63+
3. **Balanced trade-off**: 4.1x speedup with only 1.6x memory overhead
64+
4. **Good for append-heavy + periodic iteration**: Perfect fit for the CheckDeclarations pattern
65+
66+
### Why V6 (DList Native) is Also Good
67+
68+
1. **Even less memory**: 1.46x allocation overhead
69+
2. **Still very fast**: 4.0x speedup over baseline
70+
3. **Trade-off**: Slightly slower iteration (materializes on every access)
71+
72+
### Why V3/V7 (Array/ImmutableArray) Are Fast But Costly
5073

5174
1. **Contiguous memory**: Better cache locality
5275
2. **Direct indexing**: No list traversal overhead
5376
3. **Simple iteration**: Array enumeration is highly optimized
54-
4. **Trade-off**: 27-38x more memory allocation
77+
4. **Trade-off**: 8x more memory allocation
5578

5679
### Recommendations
5780

5881
1. **For this PR**: The AppendOptimized/caching changes don't help and should be reverted
59-
2. **Future work**: Consider array-backed implementation if willing to accept higher memory usage
60-
3. **Real solution**: Architectural change to avoid O(n²) iterations in CombineCcuContentFragments
82+
2. **Best alternative**: **V5 (DList with lazy cached list)** - 4.1x faster with only 1.6x memory overhead
83+
3. **Memory-conscious alternative**: V6 (DList native) - 4.0x faster with only 1.46x memory overhead
84+
4. **Future work**: Consider implementing DList-based QueueList for real performance gains
6185

6286
## Benchmark Categories
6387

@@ -68,4 +92,4 @@ The benchmark includes 5 categories:
6892
4. **Combined**: Realistic scenario with periodic operations
6993
5. **AppendQueueList**: Appending QueueList objects (not single elements)
7094

71-
All results confirm: **Current optimizations (V1/V2) provide no measurable benefit** over the baseline for the actual usage pattern.
95+
All results confirm: **Current optimizations (V1/V2) provide no measurable benefit** over the baseline for the actual usage pattern. **DList-based implementations (V5/V6) show real performance gains** with acceptable memory overhead.

0 commit comments

Comments
 (0)