Feature: Baidu KunLun XPU Accelerator Support

## Summary

Add support for Baidu KunLun XPU accelerators as a new compute backend in vLLM Playground, alongside the existing NVIDIA (CUDA), AMD (ROCm), and Google TPU options.

## Motivation

Baidu KunLun XPUs are increasingly used for LLM inference in China-based deployments. vLLM already supports KunLun devices through the `vllm_kunlun` plugin, but vLLM Playground currently has no UI or container management support for this hardware.

## Scope

### UI Changes
- Add "Baidu KunLun (XPU)" option to the accelerator selector dropdown
- Enforce Container mode when KunLun is selected (subprocess/remote not supported)
- XPU-specific help text and GPU device label ("XPU Device")
- XPU device monitoring via `xpu_smi` in the GPU Status panel

### Container Management
- KunLun-specific container image (`ccr-2nf6531g-pub.cnc.bj.baidubce.com/hac-aiacc/aiak-inference-llm`)
- Device passthrough for all 8 XPU devices (`/dev/xpu0`–`/dev/xpu7` + `/dev/xpuctrl`)
- KunLun-specific container flags: `--net=host`, `--cap-add=SYS_PTRACE`, `--security-opt seccomp=unconfined`, `--tmpfs /dev/shm` (instead of `--ipc=host`)
- Podman runtime support (not Docker)
- `bash -lc` entrypoint wrapping to ensure conda environment activation inside the container

### Environment Variables
- `VLLM_TARGET_DEVICE=kunlun`
- `VLLM_HOST_IP` (auto-resolved)
- XPU-specific tuning flags (`XPU_USE_MOE_SORTED_THRES`, `XFT_USE_FAST_SWIGLU`, `XMLIR_CUDNN_ENABLED`, etc.)
- `XPU_VISIBLE_DEVICES` and `CUDA_VISIBLE_DEVICES` for device selection

### vLLM Server Configuration
- Launch via `python -m vllm.entrypoints.openai.api_server` (not `vllm serve`)
- Auto-force `dtype=float16` when dtype is `auto` (KunLun XPU kernels don't support bfloat16)
- Port passed directly via `--port` (no `-p` mapping due to `--net=host`)
- Local model path mounted at original path (not `/models`)
- Auto-adjust `tensor_parallel_size` to match selected device count

### XPU Monitoring
- Parse `xpu_smi` output for device metrics (temperature, memory usage, utilization)
- Fallback detection of KunLun XPU in `/api/gpu-capabilities`

## Known Limitations
- Device selection (`XPU_VISIBLE_DEVICES`) interacts with vLLM's model-inspection subprocess; `CUDA_VISIBLE_DEVICES` is used alongside to control C-level device remapping
- `setup_env.sh` referenced in the container's `.bashrc` may not exist; not fatal but logs a warning
- `tensor_parallel_size=8` may hit system `ulimit nproc` limits

## Branch

`feat/kunlun-support` (based on `v0.1.7`)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: Baidu KunLun XPU Accelerator Support #48

Summary

Motivation

Scope

UI Changes

Container Management

Environment Variables

vLLM Server Configuration

XPU Monitoring

Known Limitations

Branch

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Feature: Baidu KunLun XPU Accelerator Support #48

Description

Summary

Motivation

Scope

UI Changes

Container Management

Environment Variables

vLLM Server Configuration

XPU Monitoring

Known Limitations

Branch

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions