cheat-sheets/src/AI/Tools/xinference.md at master · lcp0578/cheat-sheets

Xorbits Inference (Xinference) 开源模型平台

xinference

Xorbits Inference (Xinference) 是一个开源平台，用于简化各种 AI 模型的运行和集成。借助 Xinference，您可以使用任何开源 LLM、嵌入模型和多模态模型在云端或本地环境中运行推理，并创建强大的 AI 应用。

设置PyPI镜像源

  pip config set global.index-url https://mirrors.aliyun.com/pypi/simple/

安装命令
```
  pip install "xinference[all]"
```
分步安装
- 安装核心库。首先，只安装不包含复杂后端依赖的核心 Xinference 包。
```
  pip install xinference -i https://mirrors.aliyun.com/pypi/simple/
```
- 按需安装后端。然后，根据您计划使用的模型，选择性地安装一个或多个后端。
  - 安装 Transformers 后端 (支持大多数常见模型)
```
  pip install "xinference[transformers]" -i https://mirrors.aliyun.com/pypi/simple/
```
  - 如果需要 vLLM 后端 (用于高性能推理)
```
  pip install "xinference[vllm]" -i https://mirrors.aliyun.com/pypi/simple/
```
  - 如果您需要 GGML 格式的模型
```
  pip install "xinference[ggml]" -i https://mirrors.aliyun.com/pypi/simple/
```
  - 如果您需要特定的嵌入模型，可以按需安装
```
  pip install "xinference[embedding]" -i https://mirrors.aliyun.com/pypi/simple/
```

配置XINFERENCE_HOME

vim ~/.bashrc

  export XINFERENCE_HOME=/home/main_data/Xinference

source ~/.bashrc

使用命令启动

  # XINFERENCE_MODEL_SRC=modelscope xinference-local -H 0.0.0.0 --port 9997
  /usr/local/lib/python3.10/dist-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you.
    import pynvml  # type: ignore[import]
  /usr/local/lib/python3.10/dist-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you.
    import pynvml  # type: ignore[import]
  2025-11-28 18:47:12,521 xinference.core.supervisor 124924 INFO     Xinference supervisor 0.0.0.0:54882 started
  2025-11-28 18:47:12,547 xinference.core.worker 124924 INFO     Starting metrics export server at 0.0.0.0:None
  2025-11-28 18:47:12,549 xinference.core.worker 124924 INFO     Checking metrics export server...
  2025-11-28 18:47:13,844 xinference.deploy.local 124803 INFO     No response from process after 10 seconds
  2025-11-28 18:47:14,489 xinference.core.worker 124924 INFO     Metrics server is started at: http://0.0.0.0:31841
  2025-11-28 18:47:14,489 xinference.core.worker 124924 INFO     Purge cache directory: /home/main_data/Xinference/cache
  2025-11-28 18:47:14,491 xinference.core.worker 124924 INFO     Connected to supervisor as a fresh worker
  2025-11-28 18:47:14,514 xinference.core.worker 124924 INFO     Xinference worker 0.0.0.0:54882 started
  2025-11-28 18:47:21,079 xinference.api.restful_api 124803 INFO     Starting Xinference at endpoint: http://0.0.0.0:9997
  2025-11-28 18:47:21,250 uvicorn.error 124803 INFO     Uvicorn running on http://0.0.0.0:9997 (Press CTRL+C to quit)

Xinference详细安装

确认当前 Python 版本（需要 3.8 及以上）
```
  # python3 --version
  Python 3.10.12
```

安装 Python 虚拟环境工具

  # sudo apt update
  # sudo apt install python3-venv python3-pip -y

创建并激活虚拟环境

  # python3 -m venv /root/xinference-env
  # source /root/xinference-env/bin/activate

激活后，命令行提示符前会显示 (xinference-env)。
```
  (xinference-env) root@jichenggpu:~#
```

升级虚拟环境中的 pip、setuptools 和 wheel
```
  # pip install --upgrade pip setuptools wheel
```

安装适配你 GPU 的 PyTorch

查看CUDA 版本

  # nvidia-smi
  Mon Mar  2 10:22:12 2026       
  +-----------------------------------------------------------------------------------------+
  | NVIDIA-SMI 580.95.05              Driver Version: 580.95.05      CUDA Version: 13.0     |
  +-----------------------------------------+------------------------+----------------------+
  | GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
  | Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
  |                                         |                        |               MIG M. |
  |=========================================+========================+======================|
  |   0  NVIDIA GeForce RTX 3090        Off |   00000000:03:00.0  On |                  N/A |
  |  0%   44C    P8             30W /  370W |       9MiB /  24576MiB |      0%      Default |
  |                                         |                        |                  N/A |
  +-----------------------------------------+------------------------+----------------------+
  
  +-----------------------------------------------------------------------------------------+
  | Processes:                                                                              |
  |  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
  |        ID   ID                                                               Usage      |
  |=========================================================================================|
  |  No running processes found                                                             |
  +-----------------------------------------------------------------------------------------+

安装最新稳定版CUDA 12.4

  # pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124

验证 PyTorch 是否正确识别 GPU

  # python -c "import torch; print(torch.cuda.is_available()); print(torch.cuda.get_device_name(0))"
  True
  NVIDIA GeForce RTX 3090
  (xinference-env) root@jichenggpu:~#

安装 Xinference

  # pip install "xinference[transformers]"

启动 Xinference 服务

  # xinference-local --host 0.0.0.0 --port 9997

额外提示
- 每次使用 Xinference 前，需要先激活虚拟环境：
```
  # source /root/xinference-env/bin/activate
```
- 如果你希望 Xinference 在后台持续运行，可以使用 nohup 或 screen：
```
  # nohup xinference-local --host 0.0.0.0 --port 9997 > xinference.log 2>&1 &
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Xorbits Inference (Xinference) 开源模型平台

xinference

Xinference详细安装

FilesExpand file tree

xinference.md

Latest commit

History

xinference.md

File metadata and controls

Xorbits Inference (Xinference) 开源模型平台

xinference

Xinference详细安装