Skip to content

[Perf] Optimize allocations in the layout engine #34154

@simonrozsival

Description

@simonrozsival

Summary

The MAUI layout engine (Grid, Flex, Stack) allocates significant temporary state on every measure+arrange pass — arrays, dictionaries, struct wrappers, and event args. Through a benchmark-driven PoC on branch dev/simonrozsival/layout-perf-research, we identified and prototyped optimizations that eliminate all managed allocations in Grid and Flex Core layout paths and achieve 13–37% throughput improvements.

This issue documents the findings, benchmarks, and proposed changes for review and potential productization.

Benchmark Results

All benchmarks run on Apple M1 Max, .NET 10.0.1, Release, ShortRun (3 iterations). Allocation numbers are deterministic; timing has ±10–30% CI due to ShortRun.

LayoutAllocBenchmarker (new benchmark, lightweight fake objects, 50× Measure+Arrange loops)

This benchmark uses fake IView/IGridLayout/IStackLayout implementations (no NSubstitute) to measure true layout engine allocations without mock infrastructure noise.

Scenario Baseline Mean Optimized Mean Δ Time Baseline Alloc Optimized Alloc Δ Alloc
Grid 12ch NoSpan 58.29 µs 43.00 µs −26% 87.11 KB 0 B −100%
Grid 12ch Span 71.05 µs 49.28 µs −31% 125.39 KB 0 B −100%
Grid 60ch NoSpan 613.53 µs 535.68 µs −13% 307.42 KB 0 B −100%
Grid 60ch Span 660.17 µs 570.35 µs −14% 457.42 KB 0 B −100%
Flex Core 12ch 35.41 µs 35.89 µs ~0% 10.94 KB 0 B −100%
Flex Core 60ch 251.60 µs 211.82 µs −16% 40.63 KB 0 B −100%
VStack 12ch 2.66 µs 1.68 µs −37% 0 B 0 B
HStack 12ch 2.42 µs 1.81 µs −25% 0 B 0 B
VStack 60ch 10.43 µs 8.25 µs −21% 0 B 0 B
HStack 60ch 11.29 µs 9.31 µs −18% 0 B 0 B

LayoutHotPathBenchmarker (new benchmark, real Controls objects for Flex, 50× loops)

This benchmark uses real FlexLayout + Border children to measure end-to-end allocations including the Controls layer.

Scenario Baseline Mean Optimized Mean Δ Time Baseline Alloc Optimized Alloc Δ Alloc
Flex Wrap 12ch 267.8 µs 235.7 µs −12% 85.94 KB 37.50 KB −56%
Flex NoWrap 12ch 279.0 µs 230.2 µs −17% 85.94 KB 37.50 KB −56%
Flex Wrap 60ch 2,331.7 µs 1,683.9 µs −28% 408.60 KB 187.50 KB −54%
Flex NoWrap 60ch 2,010.6 µs 1,625.6 µs −19% 404.69 KB 187.50 KB −54%

Why NSubstitute-based benchmarks are misleading

The existing GridLayoutManagerBenchMarker uses NSubstitute mocks. After our struct conversions, each _grid[n] indexer call on a mocked IGridLayout allocates NSubstitute tracking objects. This caused the mock-based benchmark to show +60–160% regression — a pure measurement artifact. The LayoutAllocBenchmarker with lightweight fake objects confirms the optimizations achieve −13% to −31% time improvement and zero allocations.

Optimizations Implemented (PoC)

1. Grid: Struct Conversions (GridLayoutManager.cs)

Cell class → struct — Each Grid cell was a heap-allocated class. Converting to struct eliminates N object allocations per measure. Required ref parameters on mutation methods and ref var cell = ref _cells[n] for array element access.

Definition class → struct — Row/column definitions were heap-allocated classes. Converting to struct required:

  • All mutating foreach loops → indexed for loops (foreach iteration variable is readonly for structs)
  • Fix copy-mutation bug in EnsureSizeLimit: var def = defs[n]; def.Size = ...defs[n].Size = ...
  • Added readonly modifier to non-mutating properties to help JIT avoid defensive copies

GridStructure class → struct — Eliminates ~200B object header per Measure call. Stored as GridStructure _gridStructure; bool _hasGridStructure; (can't use nullable struct because .Value copies).

2. Grid: ArrayPool for All Arrays (GridLayoutManager.cs)

Replaced new IView[], new Cell[], and new Definition[] with ArrayPool<T>.Shared.Rent(n):

  • Rented arrays may be larger than requested — added _childCount, _rowCount, _columnCount tracking fields
  • All loops using .Length changed to use count fields
  • ReturnArrays() called at start of Measure before creating new GridStructure
  • IView[] cleared on return (clearArray: true) to avoid holding references; Cell[] and Definition[] are structs and don't need clearing
  • All static methods taking Definition[] defs needed an int defsCount parameter added: ResolveStars, MinimizeStars, ExpandStarDefinitions, EnsureSizeLimit, ComputeStarSizeForTarget, ExpandStars, SumDefinitions, AnyAuto

3. Grid: Dictionary Reuse (GridLayoutManager.cs)

Moved Dictionary<SpanKey, double> from GridStructure (created per-measure) to a _spansDictionary field on GridLayoutManager:

  • Passed into GridStructure via constructor
  • .Clear() instead of new each pass
  • TrackSpan's lazy init _spans ??= new() still works — if no spans needed, no Dictionary created
  • Result: even Grid scenarios with spans now achieve 0 B allocations

4. Grid: SpanKey IEquatable + HashCode (GridLayoutManager.cs)

Added IEquatable<SpanKey> implementation and HashCode.Combine for better Dictionary performance. Added #if NETSTANDARD fallback with manual hash computation.

5. Grid: foreach → for in ArrangeChildren (GridLayoutManager.cs)

With real objects (not NSubstitute mocks), converting foreach to indexed for loops eliminates enumerator boxing allocation. Confirmed safe via LayoutAllocBenchmarker.

6. Flex: SelfSizing float[] Elimination (Flex.cs)

The SelfSizing callback allocated float[] size = {w, h} on every call. Replaced with two local float variables passed by ref. −42% to −48% Flex allocation for this change alone.

7. Flex: InlineArray(4) FrameBuffer (Flex.cs)

Replaced float[] Frame (4-element heap array per Flex.Item) with [InlineArray(4)] struct FrameBuffer:

  • Conditional on #if NET8_0_OR_GREATER (not available on netstandard2.0)
  • Fallback: public float[] Frame { get; } = new float[4] on netstandard
  • Changed frame index fields from uint to int (InlineArray indexer requirement)

8. Flex: ArrayPool for ordered_indices and lines (Flex.cs)

  • ordered_indices: ArrayPool<int>.Shared.Rent(item.Count) in flex_layout.init, returned in cleanup()
  • lines: ArrayPool<flex_layout_line>.Shared.Rent(newCapacity) with manual Copy+Return for growth, returned in cleanup()
  • Changed lines array growth from Array.Resize(+1) to doubling strategy

9. Stack: Cached Count/Spacing, Indexed Loops

Cached ILayout.Count and spacing in local variables, converted foreach to indexed for loops. Also cached childCount in StackLayoutManager.UsesExpansion. Stack was already allocation-free with real objects, but these changes improve throughput by 18–37%.

10. InvalidationEventArgs: Static Cached Instances

Added InvalidationEventArgs.GetCached(InvalidationTrigger) with static singletons. Replaced new InvalidationEventArgs(trigger) in VisualElement, Page, and legacy Layout. Eliminates per-dispatch allocations.

Remaining Allocation: BindableProperty Boxing

The remaining Flex Controls-layer allocations (37.5 KB for 12ch = 64 B per child per pass) trace to VisualElement.ArrangeOverrideUpdateBoundsComponents, which sets X, Y, Width, Height via BindableObject.SetValue. Each SetValue call on a double property boxes the value. This affects all layout types, not just Flex.

This will be resolved by generic BindableProperty<T> (#34080), which enables typed property access without boxing.

Files Changed (14 files, +1328/−201 lines)

Core layout engine:

  • src/Core/src/Layouts/GridLayoutManager.cs — Struct conversions, ArrayPool, Dictionary reuse, foreach→for
  • src/Core/src/Layouts/Flex.cs — SelfSizing elimination, InlineArray, ArrayPool
  • src/Core/src/Layouts/FlexLayoutManager.cs — foreach→for
  • src/Core/src/Layouts/VerticalStackLayoutManager.cs — Cached Count/Spacing, indexed loops
  • src/Core/src/Layouts/HorizontalStackLayoutManager.cs — Cached Count, indexed loops

Controls layer:

  • src/Controls/src/Core/InvalidationEventArgs.cs — Static cached instances
  • src/Controls/src/Core/VisualElement/VisualElement.cs — Use GetCached
  • src/Controls/src/Core/Page/Page.cs — Use GetCached
  • src/Controls/src/Core/LegacyLayouts/Layout.cs — Use GetCached
  • src/Controls/src/Core/Layout/StackLayoutManager.cs — Cached childCount

New benchmarks:

  • src/Core/tests/Benchmarks/Benchmarks/LayoutAllocBenchmarker.cs — Lightweight fakes, true allocation measurement
  • src/Core/tests/Benchmarks/Benchmarks/LayoutHotPathBenchmarker.cs — Hot-path with Controls objects
  • src/Core/tests/Benchmarks/Benchmarks/InvalidationBenchmarker.cs — Invalidation dispatch

Test Status

All 441 existing tests pass (394 Core layout + 47 Controls layout).

PoC Branch

dev/simonrozsival/layout-perf-research — 6 commits, ready for review.

Metadata

Metadata

Assignees

No one assigned

    Labels

    copilotperf/generalThe issue affects performance (runtime speed, memory usage, startup time, etc.) (sub: perf)

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions