-
Notifications
You must be signed in to change notification settings - Fork 57
Open
Description
Summary
Add support for Baidu KunLun XPU accelerators as a new compute backend in vLLM Playground, alongside the existing NVIDIA (CUDA), AMD (ROCm), and Google TPU options.
Motivation
Baidu KunLun XPUs are increasingly used for LLM inference in China-based deployments. vLLM already supports KunLun devices through the vllm_kunlun plugin, but vLLM Playground currently has no UI or container management support for this hardware.
Scope
UI Changes
- Add "Baidu KunLun (XPU)" option to the accelerator selector dropdown
- Enforce Container mode when KunLun is selected (subprocess/remote not supported)
- XPU-specific help text and GPU device label ("XPU Device")
- XPU device monitoring via
xpu_smiin the GPU Status panel
Container Management
- KunLun-specific container image (
ccr-2nf6531g-pub.cnc.bj.baidubce.com/hac-aiacc/aiak-inference-llm) - Device passthrough for all 8 XPU devices (
/dev/xpu0–/dev/xpu7+/dev/xpuctrl) - KunLun-specific container flags:
--net=host,--cap-add=SYS_PTRACE,--security-opt seccomp=unconfined,--tmpfs /dev/shm(instead of--ipc=host) - Podman runtime support (not Docker)
bash -lcentrypoint wrapping to ensure conda environment activation inside the container
Environment Variables
VLLM_TARGET_DEVICE=kunlunVLLM_HOST_IP(auto-resolved)- XPU-specific tuning flags (
XPU_USE_MOE_SORTED_THRES,XFT_USE_FAST_SWIGLU,XMLIR_CUDNN_ENABLED, etc.) XPU_VISIBLE_DEVICESandCUDA_VISIBLE_DEVICESfor device selection
vLLM Server Configuration
- Launch via
python -m vllm.entrypoints.openai.api_server(notvllm serve) - Auto-force
dtype=float16when dtype isauto(KunLun XPU kernels don't support bfloat16) - Port passed directly via
--port(no-pmapping due to--net=host) - Local model path mounted at original path (not
/models) - Auto-adjust
tensor_parallel_sizeto match selected device count
XPU Monitoring
- Parse
xpu_smioutput for device metrics (temperature, memory usage, utilization) - Fallback detection of KunLun XPU in
/api/gpu-capabilities
Known Limitations
- Device selection (
XPU_VISIBLE_DEVICES) interacts with vLLM's model-inspection subprocess;CUDA_VISIBLE_DEVICESis used alongside to control C-level device remapping setup_env.shreferenced in the container's.bashrcmay not exist; not fatal but logs a warningtensor_parallel_size=8may hit systemulimit nproclimits
Branch
feat/kunlun-support (based on v0.1.7)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels