Skip to content

Commit 9d65e08

Browse files
Change compile_kernel to use threads_per_warp specified in metadata (#4814)
Intel Triton selects different `threads_per_warp` based on the kernel, and stores the selected `threads_per_warp` in metadata. This PR changes `compile_kernel` to use the stored `threads_per_warp` in metadata. This PR fixes below error with `igc-19724`: ``` terminate called after throwing an instance of 'sycl::_V1::exception' what(): The specified local size {1, 1, 32} doesn't match the required work-group size specified in the program source {1, 1, 16} ``` CI with `igc-19724` + this change: https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/16662411889
1 parent 01457fe commit 9d65e08

File tree

1 file changed

+1
-0
lines changed

1 file changed

+1
-0
lines changed

python/triton/tools/compile.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -153,6 +153,7 @@ def constexpr(s):
153153
}
154154
options = backend.parse_options(kwargs)
155155
ccinfo = triton.compile(src, target=target, options=options.__dict__)
156+
args.threads_per_warp = ccinfo.metadata.threads_per_warp
156157

157158
if getattr(ccinfo.metadata, "global_scratch_size", 0) > 0:
158159
raise RuntimeError("AOT compiling kernels with global scratch requirements is not yet implemented")

0 commit comments

Comments
 (0)