Skip to content

Commit 755ef00

Browse files
CopilotT-Gro
andcommitted
Add comprehensive QueueList BenchmarkDotNet benchmarks with 4 optimization variants
Created QueueListBenchmarks.fs with 5 implementations: - Original: Baseline - V1: AppendOptimized (current changes) - V2: Single-element optimized - V3: Array-backed - V4: ResizeArray-backed Results for 5000 appends: - V3 (array): 4.7x faster but 38x more memory - V1/V2: No improvement over baseline (within 1% margin of error) - Combined scenario (realistic usage): V1/V2 show no benefit Conclusion: AppendOptimized doesn't help for the actual usage pattern (AppendOne, not Append). Co-authored-by: T-Gro <[email protected]>
1 parent 90a3a28 commit 755ef00

File tree

3 files changed

+617
-0
lines changed

3 files changed

+617
-0
lines changed
Lines changed: 71 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,71 @@
1+
# QueueList Benchmark Results Summary
2+
3+
## Overview
4+
5+
Created comprehensive BenchmarkDotNet benchmarks for QueueList to simulate the 5000-element append scenario as used in CheckDeclarations. Tested 5 implementations:
6+
7+
- **Original**: Current baseline implementation
8+
- **V1**: AppendOptimized (current commit's optimization)
9+
- **V2**: Optimized for single-element appends
10+
- **V3**: Array-backed with preallocation
11+
- **V4**: ResizeArray-backed
12+
13+
## Key Findings
14+
15+
### AppendOne Performance (5000 sequential appends)
16+
17+
| Implementation | Mean (ms) | Ratio | Allocated | Alloc Ratio |
18+
|----------------|-----------|-------|-----------|-------------|
19+
| V3 (Array) | 3.765 | 0.21 | 47.97 MB | 38.37 |
20+
| V4 (ResizeArray) | 12.746 | 0.73 | 143.53 MB | 114.80 |
21+
| V2 (Optimized) | 17.473 | 0.99 | 1.25 MB | 1.00 |
22+
| V1 (Current) | 17.541 | 1.00 | 1.25 MB | 1.00 |
23+
| Original | 17.576 | 1.00 | 1.25 MB | 1.00 |
24+
25+
**Key Insight**: V1/V2 (list-based) have identical performance to Original for AppendOne operations, as expected. V3 (array) is **4.7x faster** but allocates 38x more memory. V4 (ResizeArray) is slower due to frequent internal copying.
26+
27+
### Combined Scenario (append + iteration + foldBack every 100 items)
28+
29+
This is closest to real CheckDeclarations usage:
30+
31+
| Implementation | Mean (ms) | Ratio | Allocated |
32+
|----------------|-----------|-------|--------------|
33+
| V3 (Array) | 4.718 | 0.24 | 50.81 MB |
34+
| V4 (ResizeArray) | 13.911 | 0.70 | 150.50 MB |
35+
| V1 (Current) | 19.560 | 0.98 | 1.84 MB |
36+
| V2 (Optimized) | 19.708 | 0.99 | 1.84 MB |
37+
| Original | 19.891 | 1.00 | N/A |
38+
39+
**Key Insight**: V1/V2 perform nearly identically (~1% difference, within margin of error). Array-based V3 is **4.2x faster** but allocates **27x more memory**.
40+
41+
## Analysis
42+
43+
### Why V1 (AppendOptimized) Didn't Help
44+
45+
1. **AppendOne dominates**: The real workload uses `AppendOne` for single elements, not `Append` for QueueLists
46+
2. **AppendOptimized overhead**: Creating intermediate merged lists has cost without benefit for single-element case
47+
3. **No structural sharing**: Each operation creates new objects, so optimization can't amortize
48+
49+
### Why V3 (Array) is Fastest
50+
51+
1. **Contiguous memory**: Better cache locality
52+
2. **Direct indexing**: No list traversal overhead
53+
3. **Simple iteration**: Array enumeration is highly optimized
54+
4. **Trade-off**: 27-38x more memory allocation
55+
56+
### Recommendations
57+
58+
1. **For this PR**: The AppendOptimized/caching changes don't help and should be reverted
59+
2. **Future work**: Consider array-backed implementation if willing to accept higher memory usage
60+
3. **Real solution**: Architectural change to avoid O(n²) iterations in CombineCcuContentFragments
61+
62+
## Benchmark Categories
63+
64+
The benchmark includes 5 categories:
65+
1. **AppendOne**: Just 5000 sequential appends
66+
2. **AppendWithIteration**: Append + full iteration each time
67+
3. **AppendWithFoldBack**: Append + foldBack each time
68+
4. **Combined**: Realistic scenario with periodic operations
69+
5. **AppendQueueList**: Appending QueueList objects (not single elements)
70+
71+
All results confirm: **Current optimizations (V1/V2) provide no measurable benefit** over the baseline for the actual usage pattern.

tests/benchmarks/FCSBenchmarks/CompilerServiceBenchmarks/FSharp.Compiler.Benchmarks.fsproj

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@
1414
<Compile Include="GraphTypeCheckingBenchmarks.fs" />
1515
<Compile Include="BackgroundCompilerBenchmarks.fs" />
1616
<Compile Include="ComputationExpressionBenchmarks.fs" />
17+
<Compile Include="QueueListBenchmarks.fs" />
1718
<Compile Include="Program.fs" />
1819
</ItemGroup>
1920

0 commit comments

Comments
 (0)