-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Description
Summary
The MAUI layout engine (Grid, Flex, Stack) allocates significant temporary state on every measure+arrange pass — arrays, dictionaries, struct wrappers, and event args. Through a benchmark-driven PoC on branch dev/simonrozsival/layout-perf-research, we identified and prototyped optimizations that eliminate all managed allocations in Grid and Flex Core layout paths and achieve 13–37% throughput improvements.
This issue documents the findings, benchmarks, and proposed changes for review and potential productization.
Benchmark Results
All benchmarks run on Apple M1 Max, .NET 10.0.1, Release, ShortRun (3 iterations). Allocation numbers are deterministic; timing has ±10–30% CI due to ShortRun.
LayoutAllocBenchmarker (new benchmark, lightweight fake objects, 50× Measure+Arrange loops)
This benchmark uses fake IView/IGridLayout/IStackLayout implementations (no NSubstitute) to measure true layout engine allocations without mock infrastructure noise.
| Scenario | Baseline Mean | Optimized Mean | Δ Time | Baseline Alloc | Optimized Alloc | Δ Alloc |
|---|---|---|---|---|---|---|
| Grid 12ch NoSpan | 58.29 µs | 43.00 µs | −26% | 87.11 KB | 0 B | −100% |
| Grid 12ch Span | 71.05 µs | 49.28 µs | −31% | 125.39 KB | 0 B | −100% |
| Grid 60ch NoSpan | 613.53 µs | 535.68 µs | −13% | 307.42 KB | 0 B | −100% |
| Grid 60ch Span | 660.17 µs | 570.35 µs | −14% | 457.42 KB | 0 B | −100% |
| Flex Core 12ch | 35.41 µs | 35.89 µs | ~0% | 10.94 KB | 0 B | −100% |
| Flex Core 60ch | 251.60 µs | 211.82 µs | −16% | 40.63 KB | 0 B | −100% |
| VStack 12ch | 2.66 µs | 1.68 µs | −37% | 0 B | 0 B | — |
| HStack 12ch | 2.42 µs | 1.81 µs | −25% | 0 B | 0 B | — |
| VStack 60ch | 10.43 µs | 8.25 µs | −21% | 0 B | 0 B | — |
| HStack 60ch | 11.29 µs | 9.31 µs | −18% | 0 B | 0 B | — |
LayoutHotPathBenchmarker (new benchmark, real Controls objects for Flex, 50× loops)
This benchmark uses real FlexLayout + Border children to measure end-to-end allocations including the Controls layer.
| Scenario | Baseline Mean | Optimized Mean | Δ Time | Baseline Alloc | Optimized Alloc | Δ Alloc |
|---|---|---|---|---|---|---|
| Flex Wrap 12ch | 267.8 µs | 235.7 µs | −12% | 85.94 KB | 37.50 KB | −56% |
| Flex NoWrap 12ch | 279.0 µs | 230.2 µs | −17% | 85.94 KB | 37.50 KB | −56% |
| Flex Wrap 60ch | 2,331.7 µs | 1,683.9 µs | −28% | 408.60 KB | 187.50 KB | −54% |
| Flex NoWrap 60ch | 2,010.6 µs | 1,625.6 µs | −19% | 404.69 KB | 187.50 KB | −54% |
Why NSubstitute-based benchmarks are misleading
The existing GridLayoutManagerBenchMarker uses NSubstitute mocks. After our struct conversions, each _grid[n] indexer call on a mocked IGridLayout allocates NSubstitute tracking objects. This caused the mock-based benchmark to show +60–160% regression — a pure measurement artifact. The LayoutAllocBenchmarker with lightweight fake objects confirms the optimizations achieve −13% to −31% time improvement and zero allocations.
Optimizations Implemented (PoC)
1. Grid: Struct Conversions (GridLayoutManager.cs)
Cell class → struct — Each Grid cell was a heap-allocated class. Converting to struct eliminates N object allocations per measure. Required ref parameters on mutation methods and ref var cell = ref _cells[n] for array element access.
Definition class → struct — Row/column definitions were heap-allocated classes. Converting to struct required:
- All mutating
foreachloops → indexedforloops (foreach iteration variable is readonly for structs) - Fix copy-mutation bug in
EnsureSizeLimit:var def = defs[n]; def.Size = ...→defs[n].Size = ... - Added
readonlymodifier to non-mutating properties to help JIT avoid defensive copies
GridStructure class → struct — Eliminates ~200B object header per Measure call. Stored as GridStructure _gridStructure; bool _hasGridStructure; (can't use nullable struct because .Value copies).
2. Grid: ArrayPool for All Arrays (GridLayoutManager.cs)
Replaced new IView[], new Cell[], and new Definition[] with ArrayPool<T>.Shared.Rent(n):
- Rented arrays may be larger than requested — added
_childCount,_rowCount,_columnCounttracking fields - All loops using
.Lengthchanged to use count fields ReturnArrays()called at start of Measure before creating new GridStructureIView[]cleared on return (clearArray: true) to avoid holding references;Cell[]andDefinition[]are structs and don't need clearing- All static methods taking
Definition[] defsneeded anint defsCountparameter added:ResolveStars,MinimizeStars,ExpandStarDefinitions,EnsureSizeLimit,ComputeStarSizeForTarget,ExpandStars,SumDefinitions,AnyAuto
3. Grid: Dictionary Reuse (GridLayoutManager.cs)
Moved Dictionary<SpanKey, double> from GridStructure (created per-measure) to a _spansDictionary field on GridLayoutManager:
- Passed into GridStructure via constructor
.Clear()instead ofneweach passTrackSpan's lazy init_spans ??= new()still works — if no spans needed, no Dictionary created- Result: even Grid scenarios with spans now achieve 0 B allocations
4. Grid: SpanKey IEquatable + HashCode (GridLayoutManager.cs)
Added IEquatable<SpanKey> implementation and HashCode.Combine for better Dictionary performance. Added #if NETSTANDARD fallback with manual hash computation.
5. Grid: foreach → for in ArrangeChildren (GridLayoutManager.cs)
With real objects (not NSubstitute mocks), converting foreach to indexed for loops eliminates enumerator boxing allocation. Confirmed safe via LayoutAllocBenchmarker.
6. Flex: SelfSizing float[] Elimination (Flex.cs)
The SelfSizing callback allocated float[] size = {w, h} on every call. Replaced with two local float variables passed by ref. −42% to −48% Flex allocation for this change alone.
7. Flex: InlineArray(4) FrameBuffer (Flex.cs)
Replaced float[] Frame (4-element heap array per Flex.Item) with [InlineArray(4)] struct FrameBuffer:
- Conditional on
#if NET8_0_OR_GREATER(not available on netstandard2.0) - Fallback:
public float[] Frame { get; } = new float[4]on netstandard - Changed frame index fields from
uinttoint(InlineArray indexer requirement)
8. Flex: ArrayPool for ordered_indices and lines (Flex.cs)
ordered_indices:ArrayPool<int>.Shared.Rent(item.Count)inflex_layout.init, returned incleanup()lines:ArrayPool<flex_layout_line>.Shared.Rent(newCapacity)with manual Copy+Return for growth, returned incleanup()- Changed lines array growth from
Array.Resize(+1)to doubling strategy
9. Stack: Cached Count/Spacing, Indexed Loops
Cached ILayout.Count and spacing in local variables, converted foreach to indexed for loops. Also cached childCount in StackLayoutManager.UsesExpansion. Stack was already allocation-free with real objects, but these changes improve throughput by 18–37%.
10. InvalidationEventArgs: Static Cached Instances
Added InvalidationEventArgs.GetCached(InvalidationTrigger) with static singletons. Replaced new InvalidationEventArgs(trigger) in VisualElement, Page, and legacy Layout. Eliminates per-dispatch allocations.
Remaining Allocation: BindableProperty Boxing
The remaining Flex Controls-layer allocations (37.5 KB for 12ch = 64 B per child per pass) trace to VisualElement.ArrangeOverride → UpdateBoundsComponents, which sets X, Y, Width, Height via BindableObject.SetValue. Each SetValue call on a double property boxes the value. This affects all layout types, not just Flex.
This will be resolved by generic BindableProperty<T> (#34080), which enables typed property access without boxing.
Files Changed (14 files, +1328/−201 lines)
Core layout engine:
src/Core/src/Layouts/GridLayoutManager.cs— Struct conversions, ArrayPool, Dictionary reuse, foreach→forsrc/Core/src/Layouts/Flex.cs— SelfSizing elimination, InlineArray, ArrayPoolsrc/Core/src/Layouts/FlexLayoutManager.cs— foreach→forsrc/Core/src/Layouts/VerticalStackLayoutManager.cs— Cached Count/Spacing, indexed loopssrc/Core/src/Layouts/HorizontalStackLayoutManager.cs— Cached Count, indexed loops
Controls layer:
src/Controls/src/Core/InvalidationEventArgs.cs— Static cached instancessrc/Controls/src/Core/VisualElement/VisualElement.cs— Use GetCachedsrc/Controls/src/Core/Page/Page.cs— Use GetCachedsrc/Controls/src/Core/LegacyLayouts/Layout.cs— Use GetCachedsrc/Controls/src/Core/Layout/StackLayoutManager.cs— Cached childCount
New benchmarks:
src/Core/tests/Benchmarks/Benchmarks/LayoutAllocBenchmarker.cs— Lightweight fakes, true allocation measurementsrc/Core/tests/Benchmarks/Benchmarks/LayoutHotPathBenchmarker.cs— Hot-path with Controls objectssrc/Core/tests/Benchmarks/Benchmarks/InvalidationBenchmarker.cs— Invalidation dispatch
Test Status
All 441 existing tests pass (394 Core layout + 47 Controls layout).
PoC Branch
dev/simonrozsival/layout-perf-research — 6 commits, ready for review.