@@ -14,7 +14,7 @@ params:
1414## Introduction
1515
1616Today HLSL (DXIL) validation enforces a fixed upper limit of 32 KB
17- of group shared memory per thread group for Compute, and Amplification
17+ of group shared memory per thread group for Compute, Node, and Amplification
1818Shaders with Mesh shaders being limited to 28 KB. Modern GPU architectures
1919often expose substantially larger physically
2020available shared memory, and practical algorithms (e.g. large tile / cluster
@@ -45,7 +45,7 @@ Introduce two core pieces:
4545
46461 . A runtime API query returning ` MaxGroupSharedMemoryPerGroup ` (in bytes).
4747 - This will return a value at minimum equal to the existing limits in SM 6.9
48- and prior i.e. 32k for CS and AS and 28k for Mesh Shaders.
48+ and prior i.e. 32k for CS, NS and AS and 28k for Mesh Shaders.
4949 - There is no defined maximum value.
5050 - Values must be 4 byte aligned.
51512 . A new optional entry-point attribute allowing a shader author to declare the
@@ -108,7 +108,7 @@ actual usage exceeds that.
108108
109109## Detailed Design
110110
111- ### Runtime Validation
111+ ### Validation
112112* If ` GroupSharedLimit ` is omitted, validation will fall back to the original
11311332k limit (28k for MS). The error message will be updated to indicate that the
114114limit may be raised with the caveat that hardware support must be checked.
@@ -132,7 +132,7 @@ Rules:
132132be a multiple of 4.
133133* At most one ` GroupSharedLimit ` attribute per entry point; duplicates are an
134134error.
135- * Applies only to compute, mesh, amplification shaders.
135+ * Applies only to compute, node, mesh, amplification shaders.
136136* The attribute does NOT itself reserve memory; it constrains static usage.
137137i.e. the calculated shared memory usage of the shader must always be <= this
138138value.
@@ -149,11 +149,11 @@ argument`.
149149- ` GroupSharedLimit attribute argument must be a multiple of 4 ` .
150150- ` Duplicate GroupSharedLimit attribute on entry point ` .
151151- ` GroupSharedLimit attribute not allowed on this shader stage `
152- (non compute/mesh/amplification).
152+ (non compute/node/ mesh/amplification).
153153- `groupshared static usage (<bytes >) exceeds declared GroupSharedLimit
154154(<limit >)`.
155155
156- Validator / pipeline creation errors:
156+ Pipeline creation errors:
157157- ` groupshared static usage (<bytes>) exceeds device capacity (<capacity>) ` .
158158
159159### Interchange Format Additions
@@ -174,17 +174,16 @@ node storing the declared limit in bytes.
174174
175175The PSV0 metadata structure is extended to include:
176176
177- * ** ` GroupSharedLimit ` ** : A 32-bit unsigned integer field indicating the
178- shader-declared group shared memory limit in bytes.
179- - ** Value = 0** : No ` GroupSharedLimit ` attribute was specified; runtime
180- validation should enforce the legacy limit (32 KB for CS/AS, 28 KB for MS).
181- - ** Value > 0** : The shader explicitly declared a limit; runtime validation
182- must ensure that Static group shared usage ≤
183- ` MaxGroupSharedMemoryPerGroup[CS/AS/MS] `
177+ * ** ` GroupSharedUsage ` ** : A 32-bit unsigned integer field indicating the
178+ actual group shared memory usage in bytes.
179+ - This value represents the computed static group shared memory usage of the
180+ shader.
181+ - Runtime validation must ensure that this usage value ≤
182+ ` MaxGroupSharedMemoryPerGroup[CS/NS/AS/MS] `
184183
185184This metadata enables the runtime to:
186- * Validate that the shader's declared limit is compatible with the device's
187- capabilities at pipeline creation time.
185+ * Validate that the shader's actual group shared memory usage is compatible
186+ with the device's capabilities at pipeline creation time.
188187* Provide clear error messages when device limits would be exceeded.
189188
190189### Validation Changes
@@ -193,11 +192,10 @@ Validator must:
193192* Sum byte sizes of all groupshared globals (respect alignment / padding like
194193today).
195194* Check attribute presence & argument correctness.
196- * Ensure attribute appears only in compute/mesh/amplification and SM >= 6.10.
197- * Emit / retain static usage metadata (existing) for runtime comparison against
198- device capability.
199- * Populate the new PSV0 ` GroupSharedLimit ` field with the attribute value (or 0
200- if absent).
195+ * Ensure that ` kDxilGroupSharedLimitTag ` metadata appears only in
196+ compute/node/mesh/amplification and SM >= 6.10.
197+ * Check that the sum of all groupshared usage is less than or equal to the
198+ specified limit (if present) OR the legacy 32k/28k limit (whichever is less).
201199
202200### Runtime Additions
203201
@@ -206,7 +204,7 @@ if absent).
206204Add a new feature query (illustrative naming):
207205* D3D12: ` D3D12_FEATURE_DATA_D3D12_OPTIONS_XX::MaxGroupSharedMemoryPerGroupCSAS `
208206 - Value declares the maximum group shared memory in bytes per thread group
209- for Compute and Amplification Shaders.
207+ for Compute, Node and Amplification Shaders.
210208 - Must be >= 32,768 and 4 byte aligned
211209* D3D12: ` D3D12_FEATURE_DATA_D3D12_OPTIONS_XX::MaxGroupSharedMemoryPerGroupMS `
212210 - Value declares the maximum group shared memory in bytes per thread group
@@ -220,7 +218,7 @@ Add a new feature query (illustrative naming):
220218## Testing
221219
222220Testing matrix axes:
223- * Stages: compute, mesh, amplification.
221+ * Stages: compute, node, mesh, amplification.
224222* Capacities: 0 - 32/28 KB, 48 KB, 64 KB, 96 KB, 128 KB.
225223* Attribute: absent vs present (below, equal, above static usage; above
226224capacity).
0 commit comments