Skip to content

Commit d9ceba8

Browse files
JoeCitizentex3d
andauthored
Store Group Shared Bytes used and not limit into metadata (microsoft#764)
The runtime is interested in the actual bytes used not the limit requested. --------- Co-authored-by: Tex Riddell <texr@microsoft.com>
1 parent ac93d1a commit d9ceba8

File tree

1 file changed

+20
-22
lines changed

1 file changed

+20
-22
lines changed

proposals/0049-variable-groupshared-memory.md

Lines changed: 20 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ params:
1414
## Introduction
1515

1616
Today HLSL (DXIL) validation enforces a fixed upper limit of 32 KB
17-
of group shared memory per thread group for Compute, and Amplification
17+
of group shared memory per thread group for Compute, Node, and Amplification
1818
Shaders with Mesh shaders being limited to 28 KB. Modern GPU architectures
1919
often expose substantially larger physically
2020
available shared memory, and practical algorithms (e.g. large tile / cluster
@@ -45,7 +45,7 @@ Introduce two core pieces:
4545

4646
1. A runtime API query returning `MaxGroupSharedMemoryPerGroup` (in bytes).
4747
- This will return a value at minimum equal to the existing limits in SM 6.9
48-
and prior i.e. 32k for CS and AS and 28k for Mesh Shaders.
48+
and prior i.e. 32k for CS, NS and AS and 28k for Mesh Shaders.
4949
- There is no defined maximum value.
5050
- Values must be 4 byte aligned.
5151
2. A new optional entry-point attribute allowing a shader author to declare the
@@ -108,7 +108,7 @@ actual usage exceeds that.
108108

109109
## Detailed Design
110110

111-
### Runtime Validation
111+
### Validation
112112
* If `GroupSharedLimit` is omitted, validation will fall back to the original
113113
32k limit (28k for MS). The error message will be updated to indicate that the
114114
limit may be raised with the caveat that hardware support must be checked.
@@ -132,7 +132,7 @@ Rules:
132132
be a multiple of 4.
133133
* At most one `GroupSharedLimit` attribute per entry point; duplicates are an
134134
error.
135-
* Applies only to compute, mesh, amplification shaders.
135+
* Applies only to compute, node, mesh, amplification shaders.
136136
* The attribute does NOT itself reserve memory; it constrains static usage.
137137
i.e. the calculated shared memory usage of the shader must always be <= this
138138
value.
@@ -149,11 +149,11 @@ argument`.
149149
- `GroupSharedLimit attribute argument must be a multiple of 4`.
150150
- `Duplicate GroupSharedLimit attribute on entry point`.
151151
- `GroupSharedLimit attribute not allowed on this shader stage`
152-
(non compute/mesh/amplification).
152+
(non compute/node/mesh/amplification).
153153
- `groupshared static usage (<bytes>) exceeds declared GroupSharedLimit
154154
(<limit>)`.
155155

156-
Validator / pipeline creation errors:
156+
Pipeline creation errors:
157157
- `groupshared static usage (<bytes>) exceeds device capacity (<capacity>)`.
158158

159159
### Interchange Format Additions
@@ -174,17 +174,16 @@ node storing the declared limit in bytes.
174174

175175
The PSV0 metadata structure is extended to include:
176176

177-
* **`GroupSharedLimit`**: A 32-bit unsigned integer field indicating the
178-
shader-declared group shared memory limit in bytes.
179-
- **Value = 0**: No `GroupSharedLimit` attribute was specified; runtime
180-
validation should enforce the legacy limit (32 KB for CS/AS, 28 KB for MS).
181-
- **Value > 0**: The shader explicitly declared a limit; runtime validation
182-
must ensure that Static group shared usage ≤
183-
`MaxGroupSharedMemoryPerGroup[CS/AS/MS]`
177+
* **`GroupSharedUsage`**: A 32-bit unsigned integer field indicating the
178+
actual group shared memory usage in bytes.
179+
- This value represents the computed static group shared memory usage of the
180+
shader.
181+
- Runtime validation must ensure that this usage value ≤
182+
`MaxGroupSharedMemoryPerGroup[CS/NS/AS/MS]`
184183

185184
This metadata enables the runtime to:
186-
* Validate that the shader's declared limit is compatible with the device's
187-
capabilities at pipeline creation time.
185+
* Validate that the shader's actual group shared memory usage is compatible
186+
with the device's capabilities at pipeline creation time.
188187
* Provide clear error messages when device limits would be exceeded.
189188

190189
### Validation Changes
@@ -193,11 +192,10 @@ Validator must:
193192
* Sum byte sizes of all groupshared globals (respect alignment / padding like
194193
today).
195194
* Check attribute presence & argument correctness.
196-
* Ensure attribute appears only in compute/mesh/amplification and SM >= 6.10.
197-
* Emit / retain static usage metadata (existing) for runtime comparison against
198-
device capability.
199-
* Populate the new PSV0 `GroupSharedLimit` field with the attribute value (or 0
200-
if absent).
195+
* Ensure that `kDxilGroupSharedLimitTag` metadata appears only in
196+
compute/node/mesh/amplification and SM >= 6.10.
197+
* Check that the sum of all groupshared usage is less than or equal to the
198+
specified limit (if present) OR the legacy 32k/28k limit (whichever is less).
201199

202200
### Runtime Additions
203201

@@ -206,7 +204,7 @@ if absent).
206204
Add a new feature query (illustrative naming):
207205
* D3D12: `D3D12_FEATURE_DATA_D3D12_OPTIONS_XX::MaxGroupSharedMemoryPerGroupCSAS`
208206
- Value declares the maximum group shared memory in bytes per thread group
209-
for Compute and Amplification Shaders.
207+
for Compute, Node and Amplification Shaders.
210208
- Must be >= 32,768 and 4 byte aligned
211209
* D3D12: `D3D12_FEATURE_DATA_D3D12_OPTIONS_XX::MaxGroupSharedMemoryPerGroupMS`
212210
- Value declares the maximum group shared memory in bytes per thread group
@@ -220,7 +218,7 @@ Add a new feature query (illustrative naming):
220218
## Testing
221219

222220
Testing matrix axes:
223-
* Stages: compute, mesh, amplification.
221+
* Stages: compute, node, mesh, amplification.
224222
* Capacities: 0 - 32/28 KB, 48 KB, 64 KB, 96 KB, 128 KB.
225223
* Attribute: absent vs present (below, equal, above static usage; above
226224
capacity).

0 commit comments

Comments
 (0)