-
Notifications
You must be signed in to change notification settings - Fork 14.9k
Open
Labels
Description
99 DML shaders are failing to validate after #163587 with the error:
error: Total Thread Group Shared Memory storage is 43688, exceeded 32768.
Validation failed.
All 99 DML shaders have names of the form QuantizedGemm*
. (e.g., QuantizedGemm_20480_16_0_uint4_packed32_float16_native_accum32_0
)
Minimal reproducible test case
// compile args: -T cs_6_7 -E CSMain -enable-16bit-types -Fo output.dat
groupshared float16_t smem[10240];
[numthreads(1, 1, 1)]
void CSMain() {
smem[0] = 0;
}
Comparing the dxil output before and after the PR commit c87e0e8, the only difference is the data layout.
1c1
< target datalayout = "e-m:e-p:32:32-i1:32-i8:8-i16:16-i32:32-i64:64-f16:16-f32:32-f64:64-n8:16:32:64"
---
> target datalayout = "e-m:e-p:32:32-i1:8-i8:8-i16:32-i32:32-i64:64-f16:32-f32:32-f64:64-n8:16:32:64"
270c270
< !1 = !{!"clang version 22.0.0git ([email protected]:Icohedron/llvm-project.git 72c6e4b230ddb5ca85361e145e177245319b271e)"}
---
> !1 = !{!"clang version 22.0.0git ([email protected]:Icohedron/llvm-project.git c87e0e8fe0ea14dcd84e835c0f7b02c5b0edca70)"}
Compiling the same DML shader with DXC, DXC gives the shader a datalayout of
target datalayout = "e-m:e-p:32:32-i1:32-i8:8-i16:16-i32:32-i64:64-f16:16-f32:32-f64:64-n8:16:32:64"
which matches the data layout that Clang emitted before the PR.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status
Planning