|
1 | | -# sgl-kernel-xpu |
| 1 | +# SGL Kernel for XPU |
| 2 | + |
| 3 | +A fork of [Kernel Library](https://github.com/sgl-project/sglang/tree/main/sgl-kernel) for SGLang support on Intel GPU backend |
| 4 | + |
| 5 | +[](https://pypi.org/project/sgl-kernel) |
| 6 | + |
| 7 | +## Installation |
| 8 | + |
| 9 | +Currently we only support building from source. To use on Intel GPUs, you need to install the Intel GPUs driver first. For installation guide, visit [Intel GPUs Driver Installation](https://www.intel.com/content/www/us/en/developer/articles/tool/pytorch-prerequisites-for-intel-gpu/2-8.html#driver-installation). |
| 10 | + |
| 11 | +## Build from source |
| 12 | + |
| 13 | +Development build: |
| 14 | + |
| 15 | +```bash |
| 16 | +source /PATH/TO/ONEAPI/setvars.sh |
| 17 | +pip install -v . |
| 18 | +``` |
| 19 | + |
| 20 | + |
| 21 | +### Build with [ccache](https://github.com/ccache/ccache) |
| 22 | +```bash |
| 23 | +# or `yum install -y ccache`. |
| 24 | +apt-get install -y ccache |
| 25 | +# Building with ccache is enabled when ccache is installed and CCACHE_DIR is set. |
| 26 | +export CCACHE_DIR=/path/to/your/ccache/dir |
| 27 | +export CCACHE_BACKEND="" |
| 28 | +export CCACHE_KEEP_LOCAL_STORAGE="TRUE" |
| 29 | +unset CCACHE_READONLY |
| 30 | +python -m uv build --wheel -Cbuild-dir=build --color=always . |
| 31 | +``` |
| 32 | + |
| 33 | +### Parallel Build |
| 34 | + |
| 35 | +We highly recommend you build sgl-kernel-xpu with Ninja. Ninja can automatically build sgl-kernel in parallel. |
| 36 | +And if you build the sgl-kernel-xpu with cmake, you need to add `CMAKE_BUILD_PARALLEL_LEVEL` for parallel build like: |
| 37 | + |
| 38 | +```bash |
| 39 | +CMAKE_BUILD_PARALLEL_LEVEL=$(nproc) python -m uv build --wheel -Cbuild-dir=build --color=always . |
| 40 | +``` |
| 41 | + |
| 42 | +### Kernel Development |
| 43 | + |
| 44 | +Steps to add a new kernel: |
| 45 | + |
| 46 | +1. Implement the kernel in [csrc](https://github.com/sgl-project/sgl-kernel-xpu/tree/main/src/) |
| 47 | +2. Expose the interface in [include/sgl_kernel_ops.h](https://github.com/sgl-project/sgl-kernel-xpu/blob/main/include/sgl_kernel_ops.h) |
| 48 | +3. Create torch extension in [csrc/common_extension.cc](https://github.com/sgl-project/sgl-kernel-xpu/blob/main/src/torch_extension_sycl.cc) |
| 49 | +4. Update [CMakeLists.txt](https://github.com/sgl-project/sgl-kernel-xpu/blob/main/CMakeLists.txt) to include new source files |
| 50 | +5. Expose Python interface in [python](https://github.com/sgl-project/sgl-kernel-xpu/blob/main/python/sgl_kernel) |
| 51 | + |
| 52 | +### Development Tips |
| 53 | + |
| 54 | +1. When implementing kernels, only define pure SYCL files and C++ interfaces. If you need to use `Torch::tensor`, use `<torch/all.h>` instead of `<torch/extension.h>`. Using `<torch/extension.h>` will cause compilation errors when using SABI. |
| 55 | + |
| 56 | +2. When creating torch extensions, add the function definition with `m.def`, and device binding with `m.impl`: |
| 57 | +- Using torch.compile need `m.def` with schema, it helps auto capture the custom kernel. Reference: [How to add FakeTensor](https://docs.google.com/document/d/1_W62p8WJOQQUzPsJYa7s701JXt0qf2OfLub2sbkHOaU/edit?tab=t.0#heading=h.ptttacy8y1u9) |
| 58 | + |
| 59 | +- How to write schema: [Schema reference](https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/native/README.md#func) |
| 60 | + |
| 61 | +### Integrating Third-Party Libraries with Data Type Conversion |
| 62 | + |
| 63 | +When integrating new third-party libraries like flash-attention, you may encounter data type compatibility issues between the C++ interface and PyTorch bindings. For example, the third-party code might use `float` or `int` types, while PyTorch requires `double` and `int64_t`. |
| 64 | + |
| 65 | +> The reason we need `double` and `int64_t` in torch binding is that TORCH_LIBRARY handles the `Python-to-C++` conversion process. Python's `float` data type actually corresponds to `double` in C++, while Python's `int` corresponds to `int64_t` in C++. |
| 66 | +
|
| 67 | +To address this issue, we provide the `make_pytorch_shim` function in [sgl_kernel_torch_shim](https://github.com/sgl-project/sgl-kernel-xpu/blob/main/include/sgl_kernel_torch_shim.h) that handles data type conversions automatically. |
| 68 | + |
| 69 | +When you need to support new data type conversions, you can easily add conversion functions like this: |
| 70 | + |
| 71 | +```cpp |
| 72 | +// Map `int` -> `int64_t` |
| 73 | +template <> |
| 74 | +struct pytorch_library_compatible_type<int> { |
| 75 | + using type = int64_t; |
| 76 | + static int convert_from_type(int64_t arg) { |
| 77 | + TORCH_CHECK(arg <= std::numeric_limits<int>::max(), "int64_t value is too large to be converted to int"); |
| 78 | + TORCH_CHECK(arg >= std::numeric_limits<int>::min(), "int64_t value is too small to be converted to int"); |
| 79 | + return arg; |
| 80 | + } |
| 81 | +}; |
| 82 | +``` |
| 83 | +
|
| 84 | +To use this with your library functions, simply wrap them with make_pytorch_shim: |
| 85 | +
|
| 86 | +```cpp |
| 87 | +/* |
| 88 | + * From flash-attention |
| 89 | + */ |
| 90 | + m.impl("fwd", torch::kXPU, make_pytorch_shim(&mha_fwd)); |
| 91 | +``` |
| 92 | + |
| 93 | +### Testing & Benchmarking |
| 94 | + |
| 95 | +1. Add pytest tests in [tests/](https://github.com/sgl-project/sgl-kernel-xpu/tree/main/tests), if you need to skip some test, please use `@pytest.mark.skipif` |
| 96 | + |
| 97 | +```python |
| 98 | +@pytest.mark.skipif( |
| 99 | + skip_condition, reason="Nvfp4 Requires compute capability of 10 or above." |
| 100 | +) |
| 101 | +``` |
| 102 | + |
| 103 | +2. Add benchmarks using [triton benchmark](https://triton-lang.org/main/python-api/generated/triton.testing.Benchmark.html) in [benchmark/](https://github.com/sgl-project/sglang/tree/main/sgl-kernel/benchmark) |
| 104 | +3. Run test suite |
| 105 | + |
| 106 | +### Release new version |
| 107 | + |
| 108 | +Update version in [pyproject.toml](https://github.com/sgl-project/sgl-kernel-xpu/blob/main/pyproject.toml) and [version.py](https://github.com/sgl-project/sgl-kernel-xpu/blob/main/python/sgl_kernel/version.py) |
0 commit comments