Skip to content

Feature: Baidu KunLun XPU Accelerator Support #48

@micytao

Description

@micytao

Summary

Add support for Baidu KunLun XPU accelerators as a new compute backend in vLLM Playground, alongside the existing NVIDIA (CUDA), AMD (ROCm), and Google TPU options.

Motivation

Baidu KunLun XPUs are increasingly used for LLM inference in China-based deployments. vLLM already supports KunLun devices through the vllm_kunlun plugin, but vLLM Playground currently has no UI or container management support for this hardware.

Scope

UI Changes

  • Add "Baidu KunLun (XPU)" option to the accelerator selector dropdown
  • Enforce Container mode when KunLun is selected (subprocess/remote not supported)
  • XPU-specific help text and GPU device label ("XPU Device")
  • XPU device monitoring via xpu_smi in the GPU Status panel

Container Management

  • KunLun-specific container image (ccr-2nf6531g-pub.cnc.bj.baidubce.com/hac-aiacc/aiak-inference-llm)
  • Device passthrough for all 8 XPU devices (/dev/xpu0/dev/xpu7 + /dev/xpuctrl)
  • KunLun-specific container flags: --net=host, --cap-add=SYS_PTRACE, --security-opt seccomp=unconfined, --tmpfs /dev/shm (instead of --ipc=host)
  • Podman runtime support (not Docker)
  • bash -lc entrypoint wrapping to ensure conda environment activation inside the container

Environment Variables

  • VLLM_TARGET_DEVICE=kunlun
  • VLLM_HOST_IP (auto-resolved)
  • XPU-specific tuning flags (XPU_USE_MOE_SORTED_THRES, XFT_USE_FAST_SWIGLU, XMLIR_CUDNN_ENABLED, etc.)
  • XPU_VISIBLE_DEVICES and CUDA_VISIBLE_DEVICES for device selection

vLLM Server Configuration

  • Launch via python -m vllm.entrypoints.openai.api_server (not vllm serve)
  • Auto-force dtype=float16 when dtype is auto (KunLun XPU kernels don't support bfloat16)
  • Port passed directly via --port (no -p mapping due to --net=host)
  • Local model path mounted at original path (not /models)
  • Auto-adjust tensor_parallel_size to match selected device count

XPU Monitoring

  • Parse xpu_smi output for device metrics (temperature, memory usage, utilization)
  • Fallback detection of KunLun XPU in /api/gpu-capabilities

Known Limitations

  • Device selection (XPU_VISIBLE_DEVICES) interacts with vLLM's model-inspection subprocess; CUDA_VISIBLE_DEVICES is used alongside to control C-level device remapping
  • setup_env.sh referenced in the container's .bashrc may not exist; not fatal but logs a warning
  • tensor_parallel_size=8 may hit system ulimit nproc limits

Branch

feat/kunlun-support (based on v0.1.7)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions