Skip to content

Commit 606be30

Browse files
committed
Add advanced topics documentation: Ray Query, Planar Reflections, Robustness2, Mipmaps and LOD, glTF Animation, and Push Constants.
1 parent 2176e06 commit 606be30

20 files changed

+1356
-0
lines changed
Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
::pp: {plus}{plus}
2+
3+
= Advanced Topics (Simple Engine)
4+
5+
Welcome — this section collects short, conversational guides that explain what each feature is, why we use it, and how it’s implemented in the Simple Engine.
6+
7+
Start anywhere that matches your interest:
8+
9+
* xref:Planar_Reflections.adoc[Planar Reflections]
10+
* xref:Ray_Query_Rendering.adoc[Ray Query Rendering]
11+
* xref:Ray_Query_Reflections_and_Transparency.adoc[Ray Query Reflections and Transparency]
12+
* xref:Rendering_Pipeline_Overview.adoc[Rendering Pipeline Overview]
13+
* xref:Forward_ForwardPlus_Deferred.adoc[Forward, Forward+, Deferred]
14+
* xref:ForwardPlus_Rendering.adoc[Forward+ Rendering]
15+
* xref:Culling.adoc[Frustum Culling and Distance LOD]
16+
* xref:Mipmaps_and_LOD.adoc[Mipmaps and LOD]
17+
* xref:GLTF_Animation.adoc[glTF Animation & Transform Composition]
18+
* xref:Push_Constants_Per_Object.adoc[Push Constants (per‑object material)]
19+
* xref:Descriptor_Indexing_UpdateAfterBind.adoc[Descriptor Indexing & Stable Updates]
20+
* xref:Separate_Image_Sampler_Descriptors.adoc[Separate Image/Sampler]
21+
* xref:Synchronization_and_Streaming.adoc[Synchronization & Streaming]
22+
* xref:Synchronization_2_Frame_Pacing.adoc[Synchronization 2 & Frame Pacing]
23+
* xref:Robustness2.adoc[VK_EXT_robustness2]
24+
* xref:Dynamic_Rendering_Local_Read.adoc[Dynamic Rendering Local Read]
25+
* xref:Shader_Tile_Image.adoc[Shader Tile Image]
26+
27+
link:../index.html[Back to Building a Simple Engine]
Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
= Frustum Culling and Distance‑based LOD
2+
3+
Culling is the simplest way to keep your GPU focused on what the camera can see. In this engine we keep it intentionally pragmatic: CPU frustum tests plus a tiny “distance/size LOD” that skips objects that would contribute only a handful of pixels.
4+
5+
* CPU frustum culling against per‑mesh AABBs
6+
* A tiny distance/size LOD that skips very small objects (projected size threshold)
7+
8+
== What we do
9+
10+
1. Extract the camera frustum planes from `proj * view` once per frame.
11+
2. For each mesh instance, transform its local AABB to world space and test against the planes.
12+
3. If enabled, estimate projected pixel size and skip objects below a threshold (separate thresholds for opaque vs transparent).
13+
14+
== Where to look in the code
15+
16+
* Plane extraction and AABB tests:
17+
** `renderer_rendering.cpp` (helpers near the top of the file)
18+
* Per-frame culling application:
19+
** `renderer_rendering.cpp` (the render list building and per-pass filtering)
20+
* UI controls:
21+
** ImGui panel in `renderer_rendering.cpp` — “Frustum culling”, “Distance LOD”, and per-pass thresholds
22+
23+
== Why it’s set up this way
24+
25+
* AABBs are cheap to transform and test; doing this on the CPU avoids sending obviously invisible draws.
26+
* A projected‑size cutoff is a practical alternative to a full LOD system for large scenes.
27+
28+
== Tuning tips
29+
30+
* Start conservative (smaller thresholds), then increase until you can’t notice pop‑in while moving.
31+
* Transparent objects typically need a slightly higher threshold due to blending artifacts at tiny sizes.
32+
33+
== Future work ideas
34+
35+
If you want to push this further:
36+
37+
* Add per-material or per-layer culling rules (e.g., keep signage readable longer).
38+
* Add hierarchical culling (BVH of AABBs) for very large scenes.
39+
* Add GPU occlusion culling (HZB) once the pipeline grows beyond “readable sample” scale.
40+
* Replace the projected-size heuristic with real mesh LODs (or meshlets).
41+
42+
== What to read next
43+
44+
* `Rendering_Pipeline_Overview.adoc`
45+
* `ForwardPlus_Rendering.adoc`
46+
* `Ray_Query_Rendering.adoc`
Lines changed: 73 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,73 @@
1+
= Descriptor Indexing and Stable Descriptor Updates
2+
3+
Vulkan descriptors are powerful, but they’re also one of the easiest places to accidentally violate “frame in flight” lifetime rules.
4+
5+
In this engine we use one simple rule:
6+
7+
*Only update descriptors at a known safe point.*
8+
9+
That rule keeps streaming stable, keeps validation clean, and (most importantly) keeps the code readable.
10+
11+
== The safe point
12+
13+
Each frame‑in‑flight has a fence. At the start of a new frame, we wait for that fence. Once it signals, the GPU is done with any work that referenced this frame’s descriptor sets. That’s the safe moment to update this frame’s sets.
14+
15+
Why it matters: updating a set that’s still in use leads to invalid writes or so‑called “update‑after‑bind” violations unless you deliberately opt into those behaviors and structure your pipeline around them. The safe point pattern stays portable and clear.
16+
17+
== What we update
18+
19+
* Material textures that finished streaming.
20+
* The reflection texture binding (binding 10) for planar reflections.
21+
* Per‑frame buffers for Forward+ (tile headers/indices, lights SSBO) when resized.
22+
23+
In Ray Query mode we also refresh the large texture table (the fixed-size sampler array) so that newly streamed textures become visible without rebuilding the pipeline.
24+
25+
We refresh only the current frame’s sets at the safe point and leave other frames to update at their own turn. This prevents cross‑frame “flip‑flop” where a texture looks different on alternating frames.
26+
27+
== Descriptor Indexing: when to use it
28+
29+
Descriptor Indexing opens features such as variable‑sized arrays and update‑after‑bind. It’s powerful, but it shifts complexity to your synchronization and lifetime rules. In this sample we emphasize clarity:
30+
31+
* We keep descriptor layouts simple and stable.
32+
* We update at the safe point rather than while a command buffer might still be pending.
33+
34+
When we do use descriptor indexing features, it’s for one specific reason: large, non-uniformly indexed descriptor arrays (e.g., Ray Query’s texture table). In that case, correctness depends on:
35+
36+
* enabling the descriptor indexing feature bits required by the GPU
37+
* marking bindings with the correct binding flags (when supported)
38+
* never caching stale Vulkan image/sampler handles across async streaming
39+
40+
If your project needs truly dynamic descriptor arrays or frequent mid‑frame updates, Descriptor Indexing can be the right tool—just document the new invariants carefully.
41+
42+
== Practical tips
43+
44+
* Centralize descriptor updates; don’t scatter writes across the frame.
45+
* Use default textures for placeholders, then swap once—don’t bounce between real and default.
46+
* Prefer combined image samplers for samples aimed at teaching; split image/sampler only when you need the flexibility.
47+
48+
== Where to look in the code
49+
50+
* Frame safe point + per-frame descriptor refresh:
51+
** `renderer_rendering.cpp`
52+
* Descriptor set layouts (including update-after-bind flags when enabled):
53+
** `renderer_pipelines.cpp`
54+
* Device feature enable for descriptor indexing:
55+
** `renderer_core.cpp`
56+
* Streaming-safe Ray Query texture table rebuild:
57+
** `renderer_ray_query.cpp`
58+
59+
== Future work ideas
60+
61+
If you want to explore more advanced descriptor patterns:
62+
63+
* Move to variable descriptor counts for texture tables (when device support is good enough for your targets).
64+
* Use separate image/sampler descriptors to share samplers across many textures.
65+
* Add a “descriptor stress test” mode (development-only) that rapidly streams textures to validate lifetime rules.
66+
67+
== What to read next
68+
69+
* `Synchronization_and_Streaming.adoc`
70+
* `Separate_Image_Sampler_Descriptors.adoc`
71+
* `Ray_Query_Rendering.adoc`
72+
73+
This conservative approach avoids common pitfalls while keeping the code approachable.
Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
= VK_KHR_dynamic_rendering_local_read — keeping color data in tile memory
2+
3+
Dynamic Rendering lets you render without full render passes/subpasses. The `VK_KHR_dynamic_rendering_local_read` feature is a small but handy addition: it allows same‑pass reads from attachments via tile/local memory paths on hardware that supports it.
4+
5+
== Why it matters
6+
7+
Some post‑lighting effects and resolve‑like steps read the color you just wrote. With this feature, drivers can service those reads from fast on‑chip memory instead of round‑tripping to VRAM.
8+
9+
== How we approach it
10+
11+
* We enable the feature if present and keep codepaths compatible when it isn’t.
12+
* We still end a rendering instance before doing layout transitions. The feature does not allow arbitrary barrier misuse — regular Synchronization 2 rules apply.
13+
14+
== Practical guidance
15+
16+
* Treat this as an optimization, not a new API surface.
17+
* Keep stage/access masks precise. In this sample we keep transitions outside active rendering for clarity.
18+
19+
== Where to look in the code
20+
21+
* Feature detection and enablement:
22+
** `renderer_core.cpp` (device feature enable path)
23+
* Dynamic rendering setup + barriers:
24+
** `renderer_rendering.cpp`
25+
** `renderer_pipelines.cpp`
26+
27+
== Future work ideas
28+
29+
If you want to demonstrate local-read more directly:
30+
31+
* Add a small “same-pass” effect that reads the current color attachment (e.g., a simple local contrast or edge highlight).
32+
* Add a debug HUD that prints whether the feature is enabled on the current device.
33+
* Compare performance with and without local-read on tile-based GPUs (mobile) using a fixed camera path.
34+
35+
== What to read next
36+
37+
* `Rendering_Pipeline_Overview.adoc`
38+
* `Synchronization_and_Streaming.adoc`
39+
* `Synchronization_2_Frame_Pacing.adoc`
Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
= Forward+ Rendering in this Sample
2+
3+
Forward+ keeps the forward shading model you already know, but limits the per‑pixel light loop to only the lights that might affect that pixel. It does this by dividing the screen into tiles (and optionally Z‑slices) and building per‑tile light lists with a compute pass.
4+
5+
== What we do
6+
7+
* Depth pre‑pass (optional): populates depth so the compute stage can cull by Z more effectively.
8+
* Compute pass: assigns lights to tiles (and slices) and writes compact lists to SSBOs.
9+
* Main PBR pass: for each pixel, fetch the tile header and iterate only those lights.
10+
11+
== Where to look in code
12+
13+
* Buffers, per-frame state, and descriptor bindings:
14+
** `renderer_compute.cpp` and `renderer_resources.cpp` (look for the `ForwardPlusPerFrame` data)
15+
* Compute dispatch and per-frame parameters:
16+
** `renderer_rendering.cpp`
17+
* Shader-side light list consumption:
18+
** `shaders/pbr.slang` (Forward+ light loop)
19+
20+
== Tips
21+
22+
* Tune tile size; 16×16 is a reasonable default for 1080p.
23+
* If you pre‑pass depth, use `depthWriteEnable=false` and `depthCompare=Equal` in the subsequent opaque color pass.
24+
25+
== Future work ideas
26+
27+
If you want to take this beyond a compact sample:
28+
29+
* Upgrade from 2D tiles to clustered Forward+ (depth slicing and/or logarithmic Z).
30+
* Add a small light “budget” UI and debug visualizations (tile heatmap) behind a development build flag.
31+
* Add shadowing (start with a single directional light shadow map) and extend the tile data to include shadowed light indices.
32+
33+
== What to read next
34+
35+
* `Forward_ForwardPlus_Deferred.adoc`
36+
* `Rendering_Pipeline_Overview.adoc`
37+
* `Synchronization_2_Frame_Pacing.adoc`
Lines changed: 102 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,102 @@
1+
= Forward, Forward+, and Deferred — choosing the right path
2+
3+
Vulkan lets you build many kinds of pipelines. In practice, most real‑time engines gravitate toward one of three shading architectures: Forward, Forward+, or Deferred.
4+
5+
This page explains what each one is, why this sample chooses Forward+, and where the relevant pieces live in the code.
6+
7+
== Forward rendering
8+
9+
Forward draws each object with its lighting in a single pass. It’s the most direct model: bind a material, bind lights (uniforms or textures/SSBOs), draw. It’s easy to reason about and integrates well with transparency and MSAA.
10+
11+
Pros:
12+
13+
* Simple and predictable.
14+
* Good with transparent objects and MSAA.
15+
* Great for small light counts or baked lighting.
16+
17+
Cons:
18+
19+
* Per‑pixel light loops can get expensive as the number of lights grows.
20+
* You evaluate lights even when most don’t affect the pixel.
21+
22+
== Forward+ (what we use for dynamic lights)
23+
24+
Forward+ partitions the screen into tiles and assigns lights to those tiles with a compute pass. The main pass then shades with only the lights relevant to the pixel’s tile. In this sample we use a lightweight Forward+ that focuses on emissive/simplified lights to keep the code approachable.
25+
26+
Pros:
27+
28+
* Scales to many local lights; you only evaluate lights that might affect the pixel.
29+
* Keeps forward’s strengths (transparency/MSAA friendliness).
30+
31+
Cons:
32+
33+
* Requires a pre‑pass or depth info and a compute dispatch to build the tile lists.
34+
* More moving parts than plain forward.
35+
36+
== Deferred shading (when to consider it)
37+
38+
Deferred writes material properties (G‑Buffer) in the first pass, then lights that buffer in a second pass. That turns lighting cost into “cost per lighted pixel” and tends to excel with many lights, but it makes transparency and MSAA trickier.
39+
40+
Pros:
41+
42+
* Many dynamic lights at high performance.
43+
* Clear separation of material/write and light/evaluate.
44+
45+
Cons:
46+
47+
* Transparent objects must be handled separately (often with a forward pass).
48+
* MSAA is more complex; memory bandwidth can be high.
49+
50+
== What the sample uses (and why)
51+
52+
We use Forward+ for small, dynamic lights and a forward material path for everything else. That keeps the code compact while still letting you place many little lights around the scene. Transparency (glass) is shaded in a second forward pass so order and blending are correct.
53+
54+
If your project needs hundreds of shadowed lights and complex post‑lighting, explore a deferred path or a hybrid: deferred for opaque, forward for transparent.
55+
56+
== Implementation highlights in this codebase
57+
58+
* A small compute pass builds per‑tile light lists.
59+
* Per‑frame SSBOs hold tile headers/light indices; the main PBR pass reads those to loop only relevant lights.
60+
* Descriptor updates happen at the frame’s safe point so we don’t touch in‑use sets.
61+
62+
== Where to look in the code
63+
64+
* Forward/Forward+ render loop integration:
65+
** `renderer_rendering.cpp`
66+
* Pipeline + descriptor layout setup:
67+
** `renderer_pipelines.cpp`
68+
* Main PBR shader (reads per-tile light lists when Forward+ is enabled):
69+
** `shaders/pbr.slang`
70+
71+
NOTE: The tile/cluster build shader is wired in `renderer_pipelines.cpp`. Start there and follow which compute pipeline is created for the Forward+ light assignment pass.
72+
73+
== Choosing for your project
74+
75+
Use Forward if:
76+
77+
* Light count is low, transparency/MSAA are priorities, and you want the simplest pipeline.
78+
79+
Use Forward+ if:
80+
81+
* You want many local lights but still want forward’s strengths.
82+
83+
Use Deferred if:
84+
85+
* You need to scale to many dynamic lights with complex lighting, and you’re ready to solve transparency/MSAA separately.
86+
87+
There’s no one answer; pick the simplest that meets your needs. You can always grow the pipeline later.
88+
89+
== Future work ideas
90+
91+
If you want to expand the lighting system beyond “readable sample”:
92+
93+
* Add clustered Forward+ (3D clusters using depth slices) instead of 2D tiles.
94+
* Add shadows (start with a single directional shadow map, then add point/spot shadows).
95+
* Add a small deferred path for opaque only (keep transparent as forward).
96+
* Add ray query helpers for selective effects (reflection rays, shadow rays, or AO probes) without building a full RT pipeline.
97+
98+
== What to read next
99+
100+
* `Rendering_Pipeline_Overview.adoc`
101+
* `ForwardPlus_Rendering.adoc`
102+
* `Synchronization_2_Frame_Pacing.adoc`

0 commit comments

Comments
 (0)