|
| 1 | +# QueueList Benchmark Results Summary |
| 2 | + |
| 3 | +## Overview |
| 4 | + |
| 5 | +Created comprehensive BenchmarkDotNet benchmarks for QueueList to simulate the 5000-element append scenario as used in CheckDeclarations. Tested 5 implementations: |
| 6 | + |
| 7 | +- **Original**: Current baseline implementation |
| 8 | +- **V1**: AppendOptimized (current commit's optimization) |
| 9 | +- **V2**: Optimized for single-element appends |
| 10 | +- **V3**: Array-backed with preallocation |
| 11 | +- **V4**: ResizeArray-backed |
| 12 | + |
| 13 | +## Key Findings |
| 14 | + |
| 15 | +### AppendOne Performance (5000 sequential appends) |
| 16 | + |
| 17 | +| Implementation | Mean (ms) | Ratio | Allocated | Alloc Ratio | |
| 18 | +|----------------|-----------|-------|-----------|-------------| |
| 19 | +| V3 (Array) | 3.765 | 0.21 | 47.97 MB | 38.37 | |
| 20 | +| V4 (ResizeArray) | 12.746 | 0.73 | 143.53 MB | 114.80 | |
| 21 | +| V2 (Optimized) | 17.473 | 0.99 | 1.25 MB | 1.00 | |
| 22 | +| V1 (Current) | 17.541 | 1.00 | 1.25 MB | 1.00 | |
| 23 | +| Original | 17.576 | 1.00 | 1.25 MB | 1.00 | |
| 24 | + |
| 25 | +**Key Insight**: V1/V2 (list-based) have identical performance to Original for AppendOne operations, as expected. V3 (array) is **4.7x faster** but allocates 38x more memory. V4 (ResizeArray) is slower due to frequent internal copying. |
| 26 | + |
| 27 | +### Combined Scenario (append + iteration + foldBack every 100 items) |
| 28 | + |
| 29 | +This is closest to real CheckDeclarations usage: |
| 30 | + |
| 31 | +| Implementation | Mean (ms) | Ratio | Allocated | |
| 32 | +|----------------|-----------|-------|--------------| |
| 33 | +| V3 (Array) | 4.718 | 0.24 | 50.81 MB | |
| 34 | +| V4 (ResizeArray) | 13.911 | 0.70 | 150.50 MB | |
| 35 | +| V1 (Current) | 19.560 | 0.98 | 1.84 MB | |
| 36 | +| V2 (Optimized) | 19.708 | 0.99 | 1.84 MB | |
| 37 | +| Original | 19.891 | 1.00 | N/A | |
| 38 | + |
| 39 | +**Key Insight**: V1/V2 perform nearly identically (~1% difference, within margin of error). Array-based V3 is **4.2x faster** but allocates **27x more memory**. |
| 40 | + |
| 41 | +## Analysis |
| 42 | + |
| 43 | +### Why V1 (AppendOptimized) Didn't Help |
| 44 | + |
| 45 | +1. **AppendOne dominates**: The real workload uses `AppendOne` for single elements, not `Append` for QueueLists |
| 46 | +2. **AppendOptimized overhead**: Creating intermediate merged lists has cost without benefit for single-element case |
| 47 | +3. **No structural sharing**: Each operation creates new objects, so optimization can't amortize |
| 48 | + |
| 49 | +### Why V3 (Array) is Fastest |
| 50 | + |
| 51 | +1. **Contiguous memory**: Better cache locality |
| 52 | +2. **Direct indexing**: No list traversal overhead |
| 53 | +3. **Simple iteration**: Array enumeration is highly optimized |
| 54 | +4. **Trade-off**: 27-38x more memory allocation |
| 55 | + |
| 56 | +### Recommendations |
| 57 | + |
| 58 | +1. **For this PR**: The AppendOptimized/caching changes don't help and should be reverted |
| 59 | +2. **Future work**: Consider array-backed implementation if willing to accept higher memory usage |
| 60 | +3. **Real solution**: Architectural change to avoid O(n²) iterations in CombineCcuContentFragments |
| 61 | + |
| 62 | +## Benchmark Categories |
| 63 | + |
| 64 | +The benchmark includes 5 categories: |
| 65 | +1. **AppendOne**: Just 5000 sequential appends |
| 66 | +2. **AppendWithIteration**: Append + full iteration each time |
| 67 | +3. **AppendWithFoldBack**: Append + foldBack each time |
| 68 | +4. **Combined**: Realistic scenario with periodic operations |
| 69 | +5. **AppendQueueList**: Appending QueueList objects (not single elements) |
| 70 | + |
| 71 | +All results confirm: **Current optimizations (V1/V2) provide no measurable benefit** over the baseline for the actual usage pattern. |
0 commit comments