dotnet
diff --git a/‎investigation/QUEUELIST_BENCHMARK_RESULTS.md‎
Lines changed: 39 additions & 15 deletions b/‎investigation/QUEUELIST_BENCHMARK_RESULTS.md‎
Lines changed: 39 additions & 15 deletions
@@ -2,13 +2,16 @@
 
 ## Overview
 
-Created comprehensive BenchmarkDotNet benchmarks for QueueList to simulate the 5000-element append scenario as used in CheckDeclarations. Tested 5 implementations:
+Created comprehensive BenchmarkDotNet benchmarks for QueueList to simulate the 5000-element append scenario as used in CheckDeclarations. Tested 8 implementations:
 
 - **Original**: Current baseline implementation
 - **V1**: AppendOptimized (current commit's optimization)
 - **V2**: Optimized for single-element appends  
 - **V3**: Array-backed with preallocation
 - **V4**: ResizeArray-backed
+- **V5**: DList with lazy materialized list (cached iteration)
+- **V6**: DList with native iteration (no caching)
+- **V7**: ImmutableArray-backed
 
 ## Key Findings
 
@@ -28,15 +31,22 @@ Created comprehensive BenchmarkDotNet benchmarks for QueueList to simulate the 5
 
 This is closest to real CheckDeclarations usage:
 
-| Implementation | Mean (ms) | Ratio | Allocated    |
-|----------------|-----------|-------|--------------|
-| V3 (Array)     | 4.718     | 0.24  | 50.81 MB     |
-| V4 (ResizeArray) | 13.911  | 0.70  | 150.50 MB    |
-| V1 (Current)   | 19.560    | 0.98  | 1.84 MB      |
-| V2 (Optimized) | 19.708    | 0.99  | 1.84 MB      |
-| Original       | 19.891    | 1.00  | N/A          |
-
-**Key Insight**: V1/V2 perform nearly identically (~1% difference, within margin of error). Array-based V3 is **4.2x faster** but allocates **27x more memory**.
+| Implementation | Mean (ms) | Ratio | Allocated | Alloc Ratio |
+|----------------|-----------|-------|-----------|-------------|
+| V3 (Array)     | 4.748     | 0.24  | 48.46 MB  | 8.14        |
+| **V5 (DList Cached)** | **4.794** | **0.24** | **9.61 MB** | **1.61** |
+| V7 (ImmutableArray) | 4.805 | 0.24  | 47.93 MB  | 8.05        |
+| V6 (DList Native) | 4.864  | 0.25  | 8.69 MB   | 1.46        |
+| V4 (ResizeArray) | 14.498  | 0.74  | 143.53 MB | 24.10       |
+| V1 (Current)   | 19.490    | 0.99  | 1.75 MB   | 0.29        |
+| V2 (Optimized) | 19.518    | 0.99  | 1.75 MB   | 0.29        |
+| Original       | 19.702    | 1.00  | 5.96 MB   | 1.00        |
+
+**Key Insights**: 
+- **V5 (DList with lazy cached list) is the WINNER**: **4.1x faster** than baseline with only **1.6x more memory** (best speed/memory trade-off)
+- V6 (DList native) is slightly slower but uses even less memory (1.46x)
+- V3/V7 (array-based) are equally fast but use 8x more memory
+- V1/V2 perform nearly identically (~1% difference, within margin of error)
 
 ## Analysis
 
@@ -46,18 +56,32 @@ This is closest to real CheckDeclarations usage:
 2. **AppendOptimized overhead**: Creating intermediate merged lists has cost without benefit for single-element case
 3. **No structural sharing**: Each operation creates new objects, so optimization can't amortize
 
-### Why V3 (Array) is Fastest
+### Why V5 (DList with Caching) is Best
+
+1. **O(1) append**: DList composition is constant time
+2. **Lazy materialization**: List is only computed when needed for iteration
+3. **Balanced trade-off**: 4.1x speedup with only 1.6x memory overhead
+4. **Good for append-heavy + periodic iteration**: Perfect fit for the CheckDeclarations pattern
+
+### Why V6 (DList Native) is Also Good
+
+1. **Even less memory**: 1.46x allocation overhead
+2. **Still very fast**: 4.0x speedup over baseline
+3. **Trade-off**: Slightly slower iteration (materializes on every access)
+
+### Why V3/V7 (Array/ImmutableArray) Are Fast But Costly
 
 1. **Contiguous memory**: Better cache locality
 2. **Direct indexing**: No list traversal overhead
 3. **Simple iteration**: Array enumeration is highly optimized
-4. **Trade-off**: 27-38x more memory allocation
+4. **Trade-off**: 8x more memory allocation
 
 ### Recommendations
 
 1. **For this PR**: The AppendOptimized/caching changes don't help and should be reverted
-2. **Future work**: Consider array-backed implementation if willing to accept higher memory usage
-3. **Real solution**: Architectural change to avoid O(n²) iterations in CombineCcuContentFragments
+2. **Best alternative**: **V5 (DList with lazy cached list)** - 4.1x faster with only 1.6x memory overhead
+3. **Memory-conscious alternative**: V6 (DList native) - 4.0x faster with only 1.46x memory overhead
+4. **Future work**: Consider implementing DList-based QueueList for real performance gains
 
 ## Benchmark Categories
 
@@ -68,4 +92,4 @@ The benchmark includes 5 categories:
 4. **Combined**: Realistic scenario with periodic operations
 5. **AppendQueueList**: Appending QueueList objects (not single elements)
 
-All results confirm: **Current optimizations (V1/V2) provide no measurable benefit** over the baseline for the actual usage pattern.
+All results confirm: **Current optimizations (V1/V2) provide no measurable benefit** over the baseline for the actual usage pattern. **DList-based implementations (V5/V6) show real performance gains** with acceptable memory overhead.