-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Description
Motivation
With the XAML Source Generator work maturing, we now generate simpler, more efficient code from XAML — compiled
bindings, trimmable styles, and source-generated property access. But the generated code still calls into the same
runtime primitives: BindableObject.SetValue, SetterSpecificityList, TypedBinding.Apply, layout managers, handler
mappers. The overhead of those primitives sets a floor on what we can achieve through code generation alone.
This epic tracks a systematic, benchmark-driven investigation of the MAUI runtime primitives that sit on the hot path of every app: control construction, property storage, data binding propagation, layout measurement, handler connection, resource resolution, and data virtualization. The goal is to identify refactoring opportunities that reduce GC pressure and eliminate unnecessary overhead — without API or behavior changes.
Principles
- Benchmark-driven. Every change must be justified by a BenchmarkDotNet measurement showing a meaningful improvement. No speculative "this should be faster" changes.
- No API changes. All optimizations are internal refactorings. Public API surface, behavior, and semantics must remain identical.
- No added complexity without measured payoff. If an optimization makes the code harder to understand but only saves 8 bytes per app lifetime, it's not worth it. Simplicity is preserved whenever possible.
- Incremental and independently mergeable. Each PR should be self-contained, and reviewable in isolation. Large
investigations may produce multiple small PRs. - Cross-platform validation. Allocation measurements are deterministic (BenchmarkDotNet
Gen0/Allocated). Throughput improvements must be directionally consistent across platforms, even if absolute numbers differ.
Scope
In scope
The investigation covers the managed-code primitives in the Microsoft.Maui.Controls.Core and Microsoft.Maui.Core layers — everything that sits between developer-written code (or XAML-generated code) and the platform renderers:
| Area | What we're looking at |
|---|---|
| Control construction | Per-instance memory footprint of the BindableObject → Element → NavigableElement → VisualElement → View → Layout hierarchy. Eager vs. lazy initialization of internal dictionaries, lists, and delegates. |
| Property system | BindableObject.SetValue/GetValue hot path: SetterSpecificityList storage, event args allocation, value-type boxing, property change notification. |
| Data binding | TypedBinding setup and steady-state cost, BindingContext propagation through the visual tree, WeakReference allocation patterns. |
| Layout engine | GridLayoutManager, FlexLayoutManager, StackLayoutManager — per-cycle allocations (struct vs. class state, array pooling, dictionary reuse), invalidation cascades. |
| Tree propagation | OnDescendantAdded/OnDescendantRemoved event args, InvalidateMeasure event args, inherited property propagation. |
| Handler connection | PropertyMapper.UpdateProperties() during handler setup — redundant mapper calls for default-valued properties. Handler creation and caching patterns. |
| Resource & style resolution | ResourceDictionary lookup chains, MergedStyle recalculation, implicit style resolution — overhead that runs on every control that participates in styling. |
| CollectionView & virtualization | ItemsView data flow, MarshalingObservableCollection synchronization, template materialization, cell recycling overhead, DataTemplate selection. |
| Visual State Manager | State transition cost, style recalculation on state changes, trigger evaluation. |
| Platform dispatch | Dispatcher marshaling overhead, BatchCommit/BatchBegin patterns, handler ↔ platform view synchronization. |
| Internal utilities | Logging infrastructure, SetterSpecificityList, ElementEventArgs, and other supporting types on the hot path. |
Out of scope
- Platform-specific renderer internals (JNI overhead, ObjC message sends, WinRT interop) — these require platform-specific profiling tools
- XAML parsing/inflation pipeline — separate workstream (XAML Source Generator)
- Image decoding and caching — separate subsystem
- Blazor Hybrid WebView rendering — separate technology stack
- Animation frame timing — primarily a platform ticker concern
Investigation areas and status
1. Control construction and memory footprint
Question: How much memory does creating a single control allocate, and how much of it is wasted on data structures that are never used?
Findings: A new Label() allocates 2,928 bytes on main. Profiling the BindableObject → Label chain identified 34 data-structure candidates for optimization. Benchmarking each independently, a subset showed measurable wins — reducing new Label() to 2,256 bytes and app-scale startup (1000 controls) from 7.79 MB to 6.91 MB.
Key changes: lazy-init of rarely-used dictionaries (_triggerSpecificity, _measureCache, _dynamicResources), struct conversion of SetterSpecificityList, removal of eager Lazy<T> wrappers, and right-sized initial capacities.
| Reference | Description | Status |
|---|---|---|
| #34149 | Tracking issue — optimize memory usage of control elements | Open |
| #34150 | PR — comprehensive memory footprint optimization (34 candidates evaluated) | Draft |
| #34131 | Issue — lazy-init _triggerSpecificity dictionary |
Open |
| #34133 | PR — lazy-init _triggerSpecificity |
Draft, review feedback from @MartyIX |
2. Property system: SetterSpecificityList
Question: Can SetterSpecificityList<T> be converted from a heap-allocated class to an inline struct?
Findings: Runtime diagnostics showed 99.3% of instances hold ≤2 entries. Converting to a struct with inline _top/_second fields and a lazy _rest overflow list eliminates the heap allocation for the common case.
| Reference | Description | Status |
|---|---|---|
| #34089 | PR — SetterSpecificityList<T> class→struct |
Open, ready for review |
3. Property system: EventArgs caching
Question: How many EventArgs objects are allocated per property change, and can they be reused?
Findings: Every SetValue call allocates new PropertyChangedEventArgs and PropertyChangingEventArgs. Since BindableProperty.PropertyName is immutable, these can be cached on the property instance. Similarly, InvalidationEventArgs for the 7 known trigger values can be statically cached.
| Reference | Description | Status |
|---|---|---|
| #34092 | Issue — cache PropertyChanged/ChangingEventArgs | Open |
| #34136 | PR — cache EventArgs on BindableProperty | Draft |
| #34094 | Issue — cache InvalidationEventArgs | Open |
| #34095 | PR — cache InvalidationEventArgs for known trigger values | Open, ready for review |
| #34093 | Issue — reuse ElementEventArgs in tree propagation | Open |
| #34134 | PR — reuse ElementEventArgs (O(depth) → O(1) allocs) | Open, ready for review |
4. Property system: Value-type boxing
Question: How much overhead comes from boxing value types through the object-based SetValue API?
Findings: Instrumentation shows 65% of all property sets store a value type (Boolean, Single, Color, Double, Thickness, etc.). Each set boxes the value into object.
Fixing this properly requires a generic BindableProperty<T> or a similar mechanism — which is an API change and outside the "no API changes" scope of this epic. Filed as a separate tracking issue for future work.
| Reference | Description | Status |
|---|---|---|
| #34080 | Issue — eliminate value-type boxing in SetValue | Open (deferred — requires API design) |
5. Data binding: TypedBinding performance
Question: Can the source-generated TypedBinding implementation be faster?
Findings: The old TypedBinding constructor eagerly allocated a Tuple[] array for property-change handlers. A new implementation uses lazy Func<IEnumerable> with compile-time INPC analysis to skip handler creation for types that don't implement INotifyPropertyChanged.
| Reference | Description | Status |
|---|---|---|
| #32382 | PR — improve TypedBinding performance (new constructor, INPC-aware handlers) | Open, .NET 11 Planning milestone |
6. Data binding: BindingContext propagation
Question: What allocates during SetInheritedBindingContext propagation?
Findings: Two sources identified: (1) new WeakReference(value) on every descendant — the existing WeakReference can be reused by updating .Target; (2) .ToArray() in ApplyBindings before iterating the properties dictionary — potentially unnecessary if no callback modifies the dictionary during iteration.
| Reference | Description | Status |
|---|---|---|
| #34129 | Issue — reduce allocations in BindingContext propagation | Open |
| #34135 | PR — WeakReference reuse + remove ToArray() | Draft |
7. Layout engine: Zero-allocation core
Question: Can the layout engine (Grid, Flex, Stack) run without any managed-heap allocations?
Findings: The core layout engines allocate temporary state (Cell/Definition classes, arrays, dictionaries) on every measure+arrange pass. Converting these to structs, using ArrayPool, reusing dictionaries across passes, and using InlineArray for Flex can eliminate core-engine allocations in steady state.
The remaining allocations in end-to-end benchmarks trace to BindableObject.SetValue boxing double values for X/Y/Width/Height — which connects back to the boxing issue (#34080).
| Reference | Description | Status |
|---|---|---|
| #34154 | Issue — optimize layout engine allocations | Open |
| #34155 | PR — zero-alloc Grid/Flex/Stack (struct conversions, ArrayPool, InlineArray) | Draft (WIP), large |
8. Handler connection: Mapper call optimization
Question: How many mapper calls during handler setup are redundant?
Findings: For 500 controls, PropertyMapper.UpdateProperties() makes ~25k mapper calls — roughly 55% are for properties the developer never set. A prototype that skips default-valued properties showed measurable startup improvement on Android. However, community feedback (@albyrock87) raised valid breaking-change concerns: user-appended mappers and properties like Visibility that need mapping even at defaults.
Current conclusion: a per-mapper approach (as in #27259) is safer than a blanket skip at the PropertyMapper level. The data from this investigation can guide which additional mappers to optimize.
| Reference | Description | Status |
|---|---|---|
| #34088 | Issue — skip redundant mapper calls (includes prototype + Android benchmarks) | Open, active discussion |
9. Logging infrastructure
Question: Does the internal logging pattern waste allocations on cold paths?
Findings: The $"..." interpolated string is always allocated even when Application.Current is null and the ?. chain short-circuits. Using an [InterpolatedStringHandler] ref struct avoids the formatting and allocation when logging is disabled.
| Reference | Description | Status |
|---|---|---|
| #34096 | Issue — simplify logging with InterpolatedStringHandler | Open |
| #34097 | PR — MauiLogger implementation (27 files migrated) | Open, .NET 11 Planning milestone |
10. Resource & style resolution
Question: How much overhead does style resolution and resource lookup add during control construction and page inflation?
Status: Not yet investigated.
ResourceDictionary lookups traverse the parent chain (Element.FindResource). MergedStyle merges implicit + explicit + class styles on every control. VisualStateManager re-evaluates triggers on state transitions. These are on the construction and page-load hot path but have not been profiled yet.
Areas to investigate:
ResourceDictionary.TryGetValuelookup chain cost (depth × frequency)MergedStylerecalculation — when does it run, and can results be cached?- Implicit style resolution — can we short-circuit for controls with no applicable implicit styles?
VisualStateGroupallocation patterns during state transitions
11. CollectionView & virtualization data flow
Question: What is the managed-code overhead of CollectionView item materialization, scrolling, and data synchronization?
Status: Not yet investigated.
CollectionView is a frequently reported performance pain point (see #31541, #30814, #30704, #27514). The managed-code side involves MarshalingObservableCollection thread synchronization, DataTemplate selection and materialization, BindingContext propagation to recycled cells, and layout measurement of variable-height items.
Areas to investigate:
MarshalingObservableCollection— does it add unnecessary allocation during scroll?DataTemplateSelectorevaluation cost per itemBindingContextpropagation to recycled cells — isSetInheritedBindingContextcalled more than necessary?ItemsViewLayoutmeasurement patterns — are items re-measured unnecessarily?- Interaction with the layout engine optimizations from area 7
12. Visual State Manager & triggers
Question: What is the cost of VSM state transitions and trigger evaluation?
Status: Not yet investigated.
VisualStateManager.GoToState triggers style recalculation, which flows through SetValue and the property system. DataTrigger and EventTrigger evaluation adds to the per-property-change cost. These are particularly relevant for interactive scenarios (hover, pressed, focused) where state changes happen at interaction frequency.
Areas to investigate:
VisualStateManager.GoToState— allocation profile and cascading SetValue callsTriggerBasecondition evaluation — is there redundant work when multiple triggers watch the same property?MultiTrigger— combinatorial evaluation cost
13. Platform dispatch & batching
Question: Is there unnecessary overhead in the managed ↔ platform synchronization layer?
Status: Not yet investigated.
Every property change that reaches a handler results in a platform dispatch. On Android this means a JNI call; on iOS an ObjC message send. While the platform call itself is out of scope, the managed-side dispatch machinery (Dispatcher, handler UpdateValue, BatchBegin/BatchCommit patterns) may have optimization opportunities.
Areas to investigate:
- Handler
UpdateValue— are there redundant managed-side checks before dispatching? BatchBegin/BatchCommit— do layout cycles batch platform updates effectively, or do they dispatch one-by-one?Dispatcher.Dispatchoverhead for same-thread callsIElementHandler.Invokecall patterns during layout
14. Navigation & page lifecycle
Question: What is the managed-code overhead of page transitions?
Status: Not yet investigated.
Navigation involves page construction, handler connection, layout measurement, and tree propagation — all areas covered by other investigations. But the navigation system itself adds coordination overhead: NavigationPage stack management, Shell route resolution, modal presentation logic.
Areas to investigate:
- Page push/pop allocation profile
- Shell route resolution cost (URI parsing, route matching)
NavigationPageanimation setup overhead- Page disposal and handler disconnection patterns
Prior work (XAML Source Generator foundation)
This performance investigation builds on the XAML Source Generator work that established the "generate simpler code" foundation:
| Reference | Description | Status |
|---|---|---|
| #21725 | Binding source generator | Merged |
| #25152 | Propagate x:DataType from parent scope to standalone bindings |
Merged |
| #20567 | Use typed bindings internally (trimming) | Merged |
| #21281 | Improve warnings when binding cannot be compiled | Merged |
| #20058 | Fix trimming warnings for image source service provider | Merged |
| #25396 | Avoid NativeAOT trim warnings for compiled bindings | Closed |
| #33561 | Trimmable Styles (XSG) | Open |
| #33611 | Trimmable EventTrigger (XSG) | Open |
Preliminary measurements
The individual PRs contain detailed benchmark results. Below is a summary of measurements collected so far. These are micro-benchmark numbers — in real apps the effects compound but are diluted by other work (XAML inflation, platform interop, rendering).
| Area | Metric | Before | After | Δ Alloc | Δ Time |
|---|---|---|---|---|---|
| Control construction | new Label() |
2,928 B | 2,256 B | −23% | — |
| Control construction | App startup (1000 controls) | 7.79 MB | 6.91 MB | −11% | — |
| Layout — Grid 12ch | 50× measure+arrange alloc | 87.1 KB | 0 B | −100% | −26% |
| Layout — Grid 60ch | 50× measure+arrange alloc | 307.4 KB | 0 B | −100% | −13% |
| Layout — Flex Core 60ch | 50× layout alloc | 40.6 KB | 0 B | −100% | −16% |
| Layout — Stack 12ch | 50× measure+arrange | 0 B | 0 B | — | −37% |
| Layout — Flex e2e 60ch | 50× cycle alloc | 404.7 KB | 187.5 KB | −54% | −19% |
| Property system | SetterSpecificityList (2 entries) | 160 B | 64 B | −60% | 3.7× |
| Property system | InvalidationEventArgs (per fire) | 24 B | 0 B | −100% | 1.3× |
| Data binding | TypedBinding steady-state | 64 B | 48 B | −25% | 3× |
| Data binding | TypedBinding setup (child path) | 1.44 KB | 1.2 KB | −17% | 2× |
| Logging | Disabled path | 168 B | 0 B | −100% | — |
See the linked PRs for full BenchmarkDotNet output and methodology.
This epic will be updated as the investigation progresses.