-
Notifications
You must be signed in to change notification settings - Fork 76
Description
The problem
On Windows, the system driver comes without the L0 development package (read, no L0 headers). Putting the SDK there is not a solution since the driver store location is not searchable (the package contains LIBs and headers that will not be visible for applications). Currently, Triton uses this API in its driver (host stub) to load and build modules, compile kernels, retrieve compiled kernel metadata, and perform platform discovery.
Solution
There are multiple approaches to resolve this. Here, I summarize the explored solution of replacing L0 calls with SYCL equivalents. This contradicts the direction runtime analysis shows (I've written on the runtime choice quite some time ago: https://github.com/intel/intel-xpu-backend-for-triton/blob/main/docs/ARCHITECTURE.md#runtime; the analysis considers L0, UR, and SYCL) but it does solve the problem at hand. It is also not a great approach if
Most of the required APIs to replace L0 are already present in SYCL. Here you can find a prototype that removes the dependency on L0 in the default path of Triton's runtime: https://github.com/intel/intel-xpu-backend-for-triton/tree/pakurapo/sycl-runtime. It currently works for the SPIR-V path (native binary caching is not supported) and lacks the spills querying. The compilation is performed via the experimental kernel compiler SYCL extension (see https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/experimental/sycl_ext_oneapi_kernel_compiler_spirv.asciidoc).
Current status
The two missing pieces to cover all the required capabilities are:
- Querying the amount of spilled memory from a compiled SYCL kernel. I already added the support into UR (Add UR_KERNEL_INFO_SPILL_MEM_SIZE kernel info prop oneapi-src/unified-runtime#2614). On the SYCL side, a new extension was proposed and approved, implementation is in progress (Add sycl ext intel kernel queries extension llvm#16834).
- For native binary support, the in-progress module API was considered but proved not fitting. The kernel compiler extension will need a tweak. This is in progress, no PRs yet.