SGL Kernel for XPU

A fork of Kernel Library for SGLang support on Intel GPU backend

Installation

Currently we only support building from source. To use on Intel GPUs, you need to install the Intel GPUs driver first. For installation guide, visit Intel GPUs Driver Installation.

Build from source

Development build:

source /PATH/TO/ONEAPI/setvars.sh
pip install -v .

Build with ccache

# or `yum install -y ccache`.
apt-get install -y ccache
# Building with ccache is enabled when ccache is installed and CCACHE_DIR is set.
export CCACHE_DIR=/path/to/your/ccache/dir
export CCACHE_BACKEND=""
export CCACHE_KEEP_LOCAL_STORAGE="TRUE"
unset CCACHE_READONLY
python -m uv build --wheel -Cbuild-dir=build --color=always .

Parallel Build

We highly recommend you build sgl-kernel-xpu with Ninja. Ninja can automatically build sgl-kernel in parallel. And if you build the sgl-kernel-xpu with cmake, you need to add CMAKE_BUILD_PARALLEL_LEVEL for parallel build like:

CMAKE_BUILD_PARALLEL_LEVEL=$(nproc) python -m uv build --wheel -Cbuild-dir=build --color=always .

Kernel Development

Steps to add a new kernel:

Implement the kernel in csrc
Expose the interface in include/sgl_kernel_ops.h
Create torch extension in csrc/common_extension.cc
Update CMakeLists.txt to include new source files
Expose Python interface in python

Development Tips

When implementing kernels, only define pure SYCL files and C++ interfaces. If you need to use Torch::tensor, use <torch/all.h> instead of <torch/extension.h>. Using <torch/extension.h> will cause compilation errors when using SABI.
When creating torch extensions, add the function definition with m.def, and device binding with m.impl:

Using torch.compile need m.def with schema, it helps auto capture the custom kernel. Reference: How to add FakeTensor
How to write schema: Schema reference

Integrating Third-Party Libraries with Data Type Conversion

When integrating new third-party libraries like flash-attention, you may encounter data type compatibility issues between the C++ interface and PyTorch bindings. For example, the third-party code might use float or int types, while PyTorch requires double and int64_t.

The reason we need double and int64_t in torch binding is that TORCH_LIBRARY handles the Python-to-C++ conversion process. Python's float data type actually corresponds to double in C++, while Python's int corresponds to int64_t in C++.

To address this issue, we provide the make_pytorch_shim function in sgl_kernel_torch_shim that handles data type conversions automatically.

When you need to support new data type conversions, you can easily add conversion functions like this:

// Map `int` -> `int64_t`
template <>
struct pytorch_library_compatible_type<int> {
  using type = int64_t;
  static int convert_from_type(int64_t arg) {
    TORCH_CHECK(arg <= std::numeric_limits<int>::max(), "int64_t value is too large to be converted  to int");
    TORCH_CHECK(arg >= std::numeric_limits<int>::min(), "int64_t value is too small to be converted to int");
    return arg;
  }
};

To use this with your library functions, simply wrap them with make_pytorch_shim:

/*
 * From flash-attention
 */
 m.impl("fwd", torch::kXPU, make_pytorch_shim(&mha_fwd));

Testing & Benchmarking

Add pytest tests in tests/, if you need to skip some test, please use @pytest.mark.skipif

@pytest.mark.skipif(
    skip_condition, reason="Nvfp4 Requires compute capability of 10 or above."
)

Add benchmarks using triton benchmark in benchmark/
Run test suite

Release new version

Update version in pyproject.toml and version.py

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.github/workflows		.github/workflows
benchmark		benchmark
cmake		cmake
include		include
python		python
src		src
tests		tests
.clang-format		.clang-format
.gitignore		.gitignore
.isort.cfg		.isort.cfg
.pre-commit-config.yaml		.pre-commit-config.yaml
CMakeLists.txt		CMakeLists.txt
Dockerfile.xpu_kernel		Dockerfile.xpu_kernel
LICENSE		LICENSE
README.md		README.md
THIRDPARTYNOTICES.txt		THIRDPARTYNOTICES.txt
pyproject.toml		pyproject.toml
rename_wheels.sh		rename_wheels.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SGL Kernel for XPU

Installation

Build from source

Build with ccache

Parallel Build

Kernel Development

Development Tips

Integrating Third-Party Libraries with Data Type Conversion

Testing & Benchmarking

Release new version

About

Uh oh!

Releases

Packages

Contributors 6

Uh oh!

Languages

License

sgl-project/sgl-kernel-xpu

Folders and files

Latest commit

History

Repository files navigation

SGL Kernel for XPU

Installation

Build from source

Build with ccache

Parallel Build

Kernel Development

Development Tips

Integrating Third-Party Libraries with Data Type Conversion

Testing & Benchmarking

Release new version

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 6

Uh oh!

Languages

Packages