Skip to content

Commit f01fe05

Browse files
meta: Add tiler optimization extension documentation.
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
1 parent 340b2f2 commit f01fe05

File tree

1 file changed

+257
-0
lines changed

1 file changed

+257
-0
lines changed
Lines changed: 257 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,257 @@
1+
# Tiler Optimization Extensions
2+
3+
To allow for optimal rendering performance on GPUs which support tiling, vkd3d-proton
4+
provides some special interfaces to expose this style of rendering by keeping render targets on-chip,
5+
allowing framebuffer fetch mechanisms which default D3D12 APIs do not expose by default.
6+
7+
Applications which intend to run D3D12 over Proton on mobile hardware such as Adreno
8+
are able to leverage this new interface with minimal changes to the renderer.
9+
The expectation is that this interface will mostly be relevant for VR games
10+
which are more likely to cater to mobile concerns.
11+
12+
## Alternative to D3D12 RenderPass API
13+
14+
The default [D3D12 RenderPass API](https://microsoft.github.io/DirectX-Specs/d3d/RenderPasses.html)
15+
supports extensions for tiler optimizations with APIs like
16+
`D3D12_RENDER_PASS_ENDING_ACCESS_PRESERVE_LOCAL_SRV`.
17+
However, this API is fundamentally incompatible with Vulkan, and it is also unimplementable by
18+
virtually all mobile hardware, including mobile hardware that vkd3d-proton cares about.
19+
This vkd3d-proton interface should be supported on a wide range of hardware.
20+
21+
It relies on [VK_KHR_dynamic_rendering_local_read](https://docs.vulkan.org/refpages/latest/refpages/source/VK_KHR_dynamic_rendering_local_read.html)
22+
as well as [VK_KHR_unified_image_layouts](https://docs.vulkan.org/refpages/latest/refpages/source/VK_KHR_unified_image_layouts.html).
23+
24+
### Additional programmable blending support
25+
26+
Unlike D3D12's RenderPass API, this extended API can express programmable blending
27+
where an attachment can be sampled from even when in `D3D12_RESOURCE_STATE_RENDER_TARGET`
28+
or `D3D12_RESOURCE_STATE_DEPTH_WRITE` resource states.
29+
30+
### Shader reuse
31+
32+
The intent is that existing shaders can be reused.
33+
For example, this shader could be used for programmable blending:
34+
35+
```
36+
Texture2D<float4> RenderTarget : register(t5, space6);
37+
38+
float4 blend(float4 dst, float4 src)
39+
{
40+
// Or whatever you want.
41+
return lerp(dst, src, src.a);
42+
}
43+
44+
float4 main(float4 pos : SV_Position, float4 color : COLOR) : SV_Target
45+
{
46+
float4 RT = RenderTarget.Load(int3(pos.xy, 0));
47+
return blend(RT, color);
48+
}
49+
```
50+
51+
Where we have new APIs for remapping `t5, space6` to e.g. RTV #0 in the root signature.
52+
Applications can also redirect normal Texture2D SRVs to the bound depth or stencil attachments.
53+
This allows for typical deferred rendering scenarios where the G-buffer is read from on-chip memory instead
54+
of textures.
55+
Multi-sampled attachments are also supported.
56+
This can be used for e.g. custom HDR resolves on-chip.
57+
58+
Layered rendering and view instancing is also supported.
59+
However, in this case, a non-arrayed `Texture2D` or `Texture2DMS` is still used in the shader.
60+
The implementation samples from the corresponding layer implicitly.
61+
62+
### `OMSetRenderTargets` support
63+
64+
This new interface is compatible with both "immediate" `OMSetRenderTargets` style rendering
65+
as well as the more dedicated RenderPass APIs. However, to be as tiler friendly as possible,
66+
it is recommended to use the RenderPass API to get the most out of this interface.
67+
68+
## New Device APIs
69+
70+
```
71+
typedef struct D3D12_VK_INPUT_ATTACHMENT_MAPPING
72+
{
73+
UINT RegisterSpace;
74+
UINT ShaderRegister;
75+
} D3D12_VK_INPUT_ATTACHMENT_MAPPING;
76+
77+
typedef struct D3D12_VK_INPUT_ATTACHMENT_MAPPINGS
78+
{
79+
UINT NumRenderTargets;
80+
D3D12_VK_INPUT_ATTACHMENT_MAPPING RenderTargets[D3D12_SIMULTANEOUS_RENDER_TARGET_COUNT];
81+
BOOL EnableDepth;
82+
BOOL EnableStencil;
83+
D3D12_VK_INPUT_ATTACHMENT_MAPPING Depth;
84+
D3D12_VK_INPUT_ATTACHMENT_MAPPING Stencil;
85+
} D3D12_VK_INPUT_ATTACHMENT_MAPPINGS;
86+
87+
typedef enum D3D12_VK_TILER_OPTIMIZATION_TIER
88+
{
89+
D3D12_VK_TILER_OPTIMIZATION_NOT_SUPPORTED = 0,
90+
D3D12_VK_TILER_OPTIMIZATION_TIER_1 = 1,
91+
} D3D12_VK_TILER_OPTIMIZATION_TIER;
92+
93+
[
94+
uuid(b7798d22-9fce-434d-8eeb-c3cef1056125),
95+
object,
96+
local,
97+
pointer_default(unique)
98+
]
99+
interface ID3D12DeviceExt2 : ID3D12DeviceExt1
100+
{
101+
D3D12_VK_TILER_OPTIMIZATION_TIER GetTilerOptimizationTier();
102+
HRESULT OptInToTilerOptimizations();
103+
UINT GetInputAttachmentDescriptorsCount();
104+
HRESULT CreateRootSignatureWithInputAttachments(
105+
UINT node_mask,
106+
const void *bytecode, SIZE_T bytecode_length,
107+
const D3D12_VK_INPUT_ATTACHMENT_MAPPINGS *mappings,
108+
REFIID riid, void **root_signature);
109+
void CreateInputAttachmentDescriptors(D3D12_CPU_DESCRIPTOR_HANDLE base_descriptor,
110+
UINT render_target_descriptor_count,
111+
const D3D12_CPU_DESCRIPTOR_HANDLE *render_target_descriptors,
112+
BOOL single_descriptor_handle,
113+
const D3D12_CPU_DESCRIPTOR_HANDLE *depth_descriptor,
114+
const D3D12_CPU_DESCRIPTOR_HANDLE *stencil_descriptor);
115+
}
116+
```
117+
118+
### `GetTilerOptimizationTier()`
119+
120+
This is a simple query to check if these APIs are supported by the device.
121+
There is currently only one feature tier.
122+
123+
### `HRESULT OptInToTilerOptimizations()`
124+
125+
This is intended to be called right after the ID3D12Device is created.
126+
Setting this modifes the implementation in certain ways to make it compatible with
127+
tiler optimizations without adding a lot of extra API churn.
128+
This call is not thread-safe and should not be called concurrently with any other API command.
129+
130+
The differences are:
131+
132+
- If a resource is created with `ALLOW_RENDER_TARGET` or `ALLOW_DEPTH_STENCIL`, `VK_IMAGE_USAGE_INPUT_ATTACHMENT_BIT` is
133+
added automatically.
134+
- When creating RTV views, `VK_IMAGE_USAGE_INPUT_ATTACHMENT_BIT` is added automatically.
135+
- When creating DSV views, `VK_IMAGE_USAGE_INPUT_ATTACHMENT_BIT` is added automatically in some cases:
136+
- For DSV views with a single plane, the usage is added automatically.
137+
- For DSV views with both planes, the DSV is only input attachment enabled if
138+
there is exactly one plane which is marked read-only with
139+
`D3D12_DSV_FLAG_READ_ONLY_DEPTH` or `D3D12_DSV_FLAG_READ_ONLY_STENCIL`.
140+
The read-only aspect is compatible with input attachments.
141+
(This is somewhat awkward, but it removes a lot of extra API churn, and is very unlikely to come up in practice).
142+
143+
### `UINT GetInputAttachmentDescriptorsCount()`
144+
145+
To be able to read from on-chip memory, the application allocates special SRVs in the descriptor heap.
146+
Rather than normal texture SRVs, Vulkan requires the use of `VK_DESCRIPTOR_TYPE_INPUT_ATTACHMENT`.
147+
When writing descriptors to the heap, multiple descriptors are written together as a group in
148+
a layout which is opaque to the application. The expectation is that 10 CBV_SRV_UAV descriptors
149+
are consumed (8 RT + Depth + Stencil), but it may be different due to descriptor packing concerns.
150+
151+
For simplicity and practicality of the implementation, the number of descriptors is fixed at the upper bound.
152+
New input attachment descriptors need only be allocated once per render pass.
153+
154+
### `HRESULT CreateRootSignatureWithInputAttachments()`
155+
156+
This is equivalent to CreateRootSignature, except that extra information can be added
157+
for input attachments. `mappings` can be `NULL` in which case the call is equivalent
158+
to `CreateRootSignature()`. (It is not reasonable to modify the encoded RootSignature payload to
159+
hack in support for this, so this was determined to be the most practical solution.)
160+
161+
Input attachment mappings only work for non-arrayed descriptors. I.e., shaders which access
162+
the bound attachments through bindless means will not work with this interface
163+
since the compiler needs to statically map resource variables to a render target index to
164+
be able to take advantage of on-chip data.
165+
166+
Input attachment mappings can conflict with normal descriptor table bindings,
167+
i.e. override existing descriptor table bindings in the root signature.
168+
In this case, the input attachment mapping takes precedence.
169+
This allows applications to keep using the normal SRV path on most implementations,
170+
but selectively "opt-in" to the fast path when supported without having to modify the shader code.
171+
172+
When a `Texture2D` or `Texture2DMS` is mapped to an input attachment, that texture must only be used
173+
with simple `::Load()` functions or equivalent. It cannot be used with a sampler object.
174+
Misuse will lead to PSO creation failure.
175+
176+
The coordinate except for sampler index is ignored, and replaced with the current pixel coordinate.
177+
To make this transformation transparent, the pixel shader can sample from `int2(SV_Position.xy)`.
178+
179+
When mappings are used, the root signature must have at least one 1 DWORD available in the root signature
180+
for the implementation to pass down additional data.
181+
182+
### `void CreateInputAttachmentDescriptors()`
183+
184+
Takes an equivalent of `OMSetRenderTargets` and writes input attachment descriptors to them.
185+
`GetInputAttachmentDescriptorsCount()` number of consecutive CBV_SRV_UAV descriptors are consumed.
186+
187+
The main difference is that depth and stencil descriptors are separate in this interface.
188+
189+
The RTV or DSV descriptors need not be the exact same ones passed to `OMSetRenderTargets()`,
190+
but they must be equivalent except for any read-only DSV state.
191+
192+
TODO: Add an interface for RenderPass API desc as well.
193+
194+
NULL RTVs or DSVs are ignored, and the matching descriptor in the heap is not modified.
195+
Using input attachments to sample from a NULL RTV or DSV is undefined behavior.
196+
Just use normal SRVs instead.
197+
198+
### PSO considerations
199+
200+
An input attachment which intends to read from a render target must define that render target
201+
in the PSO by using a sufficiently large `NumRenderTargets`.
202+
If an SRV is mapped to render target `N`, and `N` is greater-or-equal to `NumRenderTargets`,
203+
the input attachment must not be read from.
204+
205+
The RTV format can be `DXGI_FORMAT_UNKNOWN` if the render target is only used as an input attachment
206+
in the PSO.
207+
208+
Depth-stencil input attachments can sample from input attachments even with `DSVFormat` equal to `DXGI_FORMAT_UNKNOWN`.
209+
210+
## New CommandList APIs
211+
212+
```
213+
[
214+
uuid(9c228166-bf9e-464c-9078-ecf20a13271a),
215+
object,
216+
local,
217+
pointer_default(unique)
218+
]
219+
interface ID3D12GraphicsCommandListExt2 : ID3D12GraphicsCommandListExt1
220+
{
221+
void InputAttachmentPixelBarrier();
222+
void SetRootSignatureInputAttachments(D3D12_GPU_DESCRIPTOR_HANDLE handle);
223+
void SetInputAttachmentFeedback(UINT render_target_concurrent_mask, BOOL depth_concurrent, BOOL stencil_concurrent);
224+
}
225+
```
226+
227+
### `void InputAttachmentPixelBarrier()`
228+
229+
While an image as in `RENDER_TARGET` or `DEPTH_WRITE` resource states (or equivalent in enhanced barriers),
230+
it cannot be sampled from as an input attachment without performing a per-pixel barrier.
231+
This can be called at any time, even inside a render pass.
232+
Only render target writes before the pixel barrier are visible to input attachment reads after the barrier.
233+
234+
NOTE: Unlike D3D12, Vulkan supports this use case in the `VK_IMAGE_LAYOUT_GENERAL` image layout,
235+
which is why this feature requires `VK_KHR_unified_image_layouts`.
236+
237+
### `void SetRootSignatureInputAttachments()`
238+
239+
Binds the descriptors for input attachments.
240+
Unlike normal root parameters, this argument is never invalidated by binding new root signatures.
241+
It can safely be called once per OMSetRenderTargets and forgotten about.
242+
The descriptor handle must point to the currently bound descriptor heap.
243+
244+
### `void SetInputAttachmentFeedback()`
245+
246+
Programmable blending use cases and G-buffer deferred rendering are similar, but have different data access patterns.
247+
248+
In programmable blending, there is concurrent access of the render target while sampling from it.
249+
In typical G-buffer deferred there is no such issue since the data flow is clearly separated by a writing phase then a read-only phase.
250+
251+
Even with appropriate barriers in place, there may be hazards when render targets are compressed leading to garbage pixels
252+
being read in the input attachment unless some care is taken.
253+
254+
IMPORTANT: Calling this will end the render pass internally, so this should not be called last-minute while inside a render pass.
255+
For performance, set this state up front, alongside `OMSetRenderTargets()` or right before `BeginRenderPass()`.
256+
If you don't know up front, just enable full feedback for the render pass.
257+
It should be fine on most implementations anyway.

0 commit comments

Comments
 (0)