You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: en/Building_a_Simple_Engine/Mobile_Development/03_performance_optimizations.adoc
+15Lines changed: 15 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,6 +6,11 @@
6
6
7
7
Mobile devices have significantly different hardware constraints compared to desktop systems. In this section, we'll explore key performance optimizations that are essential for achieving good performance on mobile platforms.
8
8
9
+
[NOTE]
10
+
====
11
+
This chapter covers general mobile performance. For practices that arise specifically because the GPU is tile-based (TBR), see link:04_rendering_approaches.adoc[Rendering Approaches: Tile-Based Rendering].
12
+
====
13
+
9
14
=== Texture Optimizations
10
15
11
16
[NOTE]
@@ -181,6 +186,11 @@ struct OptimizedVertex {
181
186
};
182
187
----
183
188
189
+
[NOTE]
190
+
====
191
+
If you are targeting tile-based GPUs (TBR), bandwidth can be heavily impacted by attachment load/store behavior and tile flushes. See link:04_rendering_approaches.adoc[Rendering Approaches] — sections “Attachment Load/Store Operations on Tilers” and “Pipelining on Tilers: Subpass Dependencies and BY_REGION” for concrete guidance.
192
+
====
193
+
184
194
=== Draw Call Optimizations
185
195
186
196
Mobile GPUs are particularly sensitive to draw call overhead:
@@ -191,6 +201,11 @@ Mobile GPUs are particularly sensitive to draw call overhead:
191
201
192
202
3. *Level of Detail (LOD)*: Implement LOD systems to reduce geometry complexity for distant objects.
193
203
204
+
[NOTE]
205
+
====
206
+
On tile-based GPUs, reducing CPU overhead is important, but keeping work and data on-chip via proper pipelining and subpasses often yields larger gains. See link:04_rendering_approaches.adoc[Rendering Approaches] — “Pipelining on Tilers: Subpass Dependencies and BY_REGION” for barrier/subpass patterns, and “Attachment Load/Store Operations on Tilers” for loadOp/storeOp guidance that avoids external memory traffic.
207
+
====
208
+
194
209
=== Vendor-Specific Optimizations
195
210
196
211
Different mobile GPU vendors have specific architectures that may benefit from targeted optimizations.
* *Optimize for Tile Size*: Consider the tile size when designing your rendering algorithm. For example, if you know the tile size is 16x16, you might organize your data or algorithms to work efficiently with that size.
107
108
109
+
===== Attachment Load/Store Operations on Tilers
110
+
111
+
On tile-based GPUs, correctly using loadOp and storeOp is one of the highest-impact optimizations:
112
+
113
+
- Clear attachments with loadOp = CLEAR and initialLayout = UNDEFINED when you don't need previous contents. This avoids an external memory read for the tile.
114
+
- Use storeOp = DONT_CARE for attachments whose results are not needed after the render pass (e.g., transient depth or intermediate color targets). This can prevent flushing the tile back to main memory.
115
+
- For the swapchain image (or any image you will sample/transfer from later), use storeOp = STORE and set finalLayout appropriately (e.g., PRESENT_SRC_KHR for the swapchain).
116
+
- For MSAA, resolve within the same render pass so the hardware can resolve from tile memory and only store the resolved image to external memory.
If you use dynamic rendering, the same rules apply via vk::RenderingAttachmentInfo loadOp/storeOp fields.
146
+
See Vulkan Guide for background: Render Passes and Subpasses, Tile-based GPUs.
147
+
====
148
+
149
+
===== Pipelining on Tilers: Subpass Dependencies and BY_REGION
150
+
151
+
Tile-based GPUs benefit from fine-grained synchronization that keeps work and data on-chip:
152
+
153
+
- Prefer subpasses with input attachments to keep producer/consumer within the same render pass, enabling tile-local reads.
154
+
- Use vk::DependencyFlagBits::eByRegion to scope hazards to the pixel regions actually written/read, avoiding unnecessary tile flushes.
155
+
- Avoid over-broad barriers (e.g., ALL_COMMANDS, MEMORY_READ/WRITE) that serialize the pipeline and may force external memory traffic. Use precise stage/access masks.
156
+
157
+
Example: dependency from a color-writing subpass to a subpass that reads that color as an input attachment.
With Synchronization2 (vkCmdPipelineBarrier2 and friends) avoid ALL_COMMANDS and prefer the minimal set of stages/access that capture your hazard. Use render pass/subpass structure when possible—it's the most tiler-friendly way to express pipelining.
188
+
====
189
+
190
+
For further guidance, see the xref:https://docs.vulkan.org/guide/latest/[Vulkan Guide] topics on Tile-based GPUs, Render Passes, and Synchronization.
191
+
108
192
===== Memory Management
109
193
110
194
To improve the efficiency of memory allocation in TBR architectures:
0 commit comments