Skip to content

Commit e36af02

Browse files
authored
Init the first version (#2)
* init * remove cuda related code * minor update of README * clean the rest cuda code
1 parent 636f1a5 commit e36af02

File tree

93 files changed

+14123
-1
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

93 files changed

+14123
-1
lines changed

.clang-format

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
BasedOnStyle: Google
2+
IndentWidth: 2
3+
ColumnLimit: 120
4+
AllowShortFunctionsOnASingleLine: Empty
5+
DerivePointerAlignment: false
6+
PointerAlignment: Left
7+
NamespaceIndentation: None
8+
SortIncludes: true
9+
AllowShortLoopsOnASingleLine: false
10+
BinPackParameters: false # Prevents packing parameters in declarations
11+
BinPackArguments: false # Prevents packing arguments in function calls
12+
AlignAfterOpenBracket: AlwaysBreak # Forces a break after the opening parenthesis
13+
AlignOperands: Align # Aligns arguments vertically
14+
PenaltyBreakBeforeFirstCallParameter: 1 # Encourages breaking before the first argument
15+
PenaltyReturnTypeOnItsOwnLine: 100 # Keeps return type with function name

.gitignore

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -39,3 +39,9 @@
3939

4040
# debug information files
4141
*.dwo
42+
43+
# python build files
44+
*.pyc
45+
*.pyo
46+
47+
build

CMakeLists.txt

Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
cmake_minimum_required(VERSION 3.19.2)
2+
project(sgl_kernel)
3+
4+
set(CMAKE_POSITION_INDEPENDENT_CODE ON)
5+
set(CMAKE_CXX_STANDARD 17)
6+
7+
# Torch
8+
find_package(Python3 COMPONENTS Interpreter Development.Module ${SKBUILD_SABI_COMPONENT} REQUIRED)
9+
10+
execute_process(
11+
COMMAND ${PYTHON3_EXECUTABLE}
12+
-c "import torch; print(torch.utils.cmake_prefix_path)"
13+
OUTPUT_VARIABLE TORCH_PY_PREFIX
14+
OUTPUT_STRIP_TRAILING_WHITESPACE
15+
)
16+
17+
message(STATUS ${TORCH_PY_PREFIX})
18+
list(APPEND CMAKE_PREFIX_PATH ${TORCH_PY_PREFIX}/Torch)
19+
find_package(Torch REQUIRED)
20+
21+
22+
set(SGL_OPS_XPU_ROOT ${PROJECT_SOURCE_DIR})
23+
list(APPEND CMAKE_MODULE_PATH ${SGL_OPS_XPU_ROOT}/cmake/Modules)
24+
25+
include(${SGL_OPS_XPU_ROOT}/cmake/SYCL.cmake)
26+
include(${SGL_OPS_XPU_ROOT}/cmake/BuildFlags.cmake)
27+
28+
include(FetchContent)
29+
30+
# # cutlass
31+
# FetchContent_Declare(
32+
# repo-cutlass-sycl
33+
# GIT_REPOSITORY https://github.com/codeplaysoftware/cutlass-sycl.git
34+
# GIT_TAG ef9797f4327886ad231bfe853099ca022060c293
35+
# GIT_SHALLOW OFF
36+
# )
37+
# FetchContent_Populate(repo-cutlass-sycl)
38+
39+
include_directories(
40+
${CMAKE_CURRENT_SOURCE_DIR}/include
41+
${CMAKE_CURRENT_SOURCE_DIR}/src
42+
# ${repo-cutlass-sycl_SOURCE_DIR}/include
43+
# ${repo-cutlass-sycl_SOURCE_DIR}/tools/util/include
44+
)
45+
46+
add_subdirectory(${SGL_OPS_XPU_ROOT}/src)

README.md

Lines changed: 108 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1,108 @@
1-
# sgl-kernel-xpu
1+
# SGL Kernel for XPU
2+
3+
A fork of [Kernel Library](https://github.com/sgl-project/sglang/tree/main/sgl-kernel) for SGLang support on Intel GPU backend
4+
5+
[![PyPI](https://img.shields.io/pypi/v/sgl-kernel)](https://pypi.org/project/sgl-kernel)
6+
7+
## Installation
8+
9+
Currently we only support building from source. To use on Intel GPUs, you need to install the Intel GPUs driver first. For installation guide, visit [Intel GPUs Driver Installation](https://www.intel.com/content/www/us/en/developer/articles/tool/pytorch-prerequisites-for-intel-gpu/2-8.html#driver-installation).
10+
11+
## Build from source
12+
13+
Development build:
14+
15+
```bash
16+
source /PATH/TO/ONEAPI/setvars.sh
17+
pip install -v .
18+
```
19+
20+
21+
### Build with [ccache](https://github.com/ccache/ccache)
22+
```bash
23+
# or `yum install -y ccache`.
24+
apt-get install -y ccache
25+
# Building with ccache is enabled when ccache is installed and CCACHE_DIR is set.
26+
export CCACHE_DIR=/path/to/your/ccache/dir
27+
export CCACHE_BACKEND=""
28+
export CCACHE_KEEP_LOCAL_STORAGE="TRUE"
29+
unset CCACHE_READONLY
30+
python -m uv build --wheel -Cbuild-dir=build --color=always .
31+
```
32+
33+
### Parallel Build
34+
35+
We highly recommend you build sgl-kernel-xpu with Ninja. Ninja can automatically build sgl-kernel in parallel.
36+
And if you build the sgl-kernel-xpu with cmake, you need to add `CMAKE_BUILD_PARALLEL_LEVEL` for parallel build like:
37+
38+
```bash
39+
CMAKE_BUILD_PARALLEL_LEVEL=$(nproc) python -m uv build --wheel -Cbuild-dir=build --color=always .
40+
```
41+
42+
### Kernel Development
43+
44+
Steps to add a new kernel:
45+
46+
1. Implement the kernel in [csrc](https://github.com/sgl-project/sgl-kernel-xpu/tree/main/src/)
47+
2. Expose the interface in [include/sgl_kernel_ops.h](https://github.com/sgl-project/sgl-kernel-xpu/blob/main/include/sgl_kernel_ops.h)
48+
3. Create torch extension in [csrc/common_extension.cc](https://github.com/sgl-project/sgl-kernel-xpu/blob/main/src/torch_extension_sycl.cc)
49+
4. Update [CMakeLists.txt](https://github.com/sgl-project/sgl-kernel-xpu/blob/main/CMakeLists.txt) to include new source files
50+
5. Expose Python interface in [python](https://github.com/sgl-project/sgl-kernel-xpu/blob/main/python/sgl_kernel)
51+
52+
### Development Tips
53+
54+
1. When implementing kernels, only define pure SYCL files and C++ interfaces. If you need to use `Torch::tensor`, use `<torch/all.h>` instead of `<torch/extension.h>`. Using `<torch/extension.h>` will cause compilation errors when using SABI.
55+
56+
2. When creating torch extensions, add the function definition with `m.def`, and device binding with `m.impl`:
57+
- Using torch.compile need `m.def` with schema, it helps auto capture the custom kernel. Reference: [How to add FakeTensor](https://docs.google.com/document/d/1_W62p8WJOQQUzPsJYa7s701JXt0qf2OfLub2sbkHOaU/edit?tab=t.0#heading=h.ptttacy8y1u9)
58+
59+
- How to write schema: [Schema reference](https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/native/README.md#func)
60+
61+
### Integrating Third-Party Libraries with Data Type Conversion
62+
63+
When integrating new third-party libraries like flash-attention, you may encounter data type compatibility issues between the C++ interface and PyTorch bindings. For example, the third-party code might use `float` or `int` types, while PyTorch requires `double` and `int64_t`.
64+
65+
> The reason we need `double` and `int64_t` in torch binding is that TORCH_LIBRARY handles the `Python-to-C++` conversion process. Python's `float` data type actually corresponds to `double` in C++, while Python's `int` corresponds to `int64_t` in C++.
66+
67+
To address this issue, we provide the `make_pytorch_shim` function in [sgl_kernel_torch_shim](https://github.com/sgl-project/sgl-kernel-xpu/blob/main/include/sgl_kernel_torch_shim.h) that handles data type conversions automatically.
68+
69+
When you need to support new data type conversions, you can easily add conversion functions like this:
70+
71+
```cpp
72+
// Map `int` -> `int64_t`
73+
template <>
74+
struct pytorch_library_compatible_type<int> {
75+
using type = int64_t;
76+
static int convert_from_type(int64_t arg) {
77+
TORCH_CHECK(arg <= std::numeric_limits<int>::max(), "int64_t value is too large to be converted to int");
78+
TORCH_CHECK(arg >= std::numeric_limits<int>::min(), "int64_t value is too small to be converted to int");
79+
return arg;
80+
}
81+
};
82+
```
83+
84+
To use this with your library functions, simply wrap them with make_pytorch_shim:
85+
86+
```cpp
87+
/*
88+
* From flash-attention
89+
*/
90+
m.impl("fwd", torch::kXPU, make_pytorch_shim(&mha_fwd));
91+
```
92+
93+
### Testing & Benchmarking
94+
95+
1. Add pytest tests in [tests/](https://github.com/sgl-project/sgl-kernel-xpu/tree/main/tests), if you need to skip some test, please use `@pytest.mark.skipif`
96+
97+
```python
98+
@pytest.mark.skipif(
99+
skip_condition, reason="Nvfp4 Requires compute capability of 10 or above."
100+
)
101+
```
102+
103+
2. Add benchmarks using [triton benchmark](https://triton-lang.org/main/python-api/generated/triton.testing.Benchmark.html) in [benchmark/](https://github.com/sgl-project/sglang/tree/main/sgl-kernel/benchmark)
104+
3. Run test suite
105+
106+
### Release new version
107+
108+
Update version in [pyproject.toml](https://github.com/sgl-project/sgl-kernel-xpu/blob/main/pyproject.toml) and [version.py](https://github.com/sgl-project/sgl-kernel-xpu/blob/main/python/sgl_kernel/version.py)

0 commit comments

Comments
 (0)