Skip to content

Latest commit

 

History

History
120 lines (91 loc) · 6.31 KB

File metadata and controls

120 lines (91 loc) · 6.31 KB

Xorbits Inference (Xinference) 开源模型平台

xinference

  • Xorbits Inference (Xinference) 是一个开源平台,用于简化各种 AI 模型的运行和集成。借助 Xinference,您可以使用任何开源 LLM、嵌入模型和多模态模型在云端或本地环境中运行推理,并创建强大的 AI 应用。

  • 设置PyPI镜像源

      pip config set global.index-url https://mirrors.aliyun.com/pypi/simple/
    
  • 安装命令

      pip install "xinference[all]"
    
  • 分步安装

    • 安装核心库。首先,只安装不包含复杂后端依赖的核心 Xinference 包。

        pip install xinference -i https://mirrors.aliyun.com/pypi/simple/
      
    • 按需安装后端。然后,根据您计划使用的模型,选择性地安装一个或多个后端。

      • 安装 Transformers 后端 (支持大多数常见模型)

          pip install "xinference[transformers]" -i https://mirrors.aliyun.com/pypi/simple/
        
      • 如果需要 vLLM 后端 (用于高性能推理)

          pip install "xinference[vllm]" -i https://mirrors.aliyun.com/pypi/simple/
        
      • 如果您需要 GGML 格式的模型

          pip install "xinference[ggml]" -i https://mirrors.aliyun.com/pypi/simple/
        
      • 如果您需要特定的嵌入模型,可以按需安装

          pip install "xinference[embedding]" -i https://mirrors.aliyun.com/pypi/simple/
        
  • 配置XINFERENCE_HOME

    • vim ~/.bashrc

        export XINFERENCE_HOME=/home/main_data/Xinference
      
    • source ~/.bashrc

  • 使用命令启动

      # XINFERENCE_MODEL_SRC=modelscope xinference-local -H 0.0.0.0 --port 9997
      /usr/local/lib/python3.10/dist-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you.
        import pynvml  # type: ignore[import]
      /usr/local/lib/python3.10/dist-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you.
        import pynvml  # type: ignore[import]
      2025-11-28 18:47:12,521 xinference.core.supervisor 124924 INFO     Xinference supervisor 0.0.0.0:54882 started
      2025-11-28 18:47:12,547 xinference.core.worker 124924 INFO     Starting metrics export server at 0.0.0.0:None
      2025-11-28 18:47:12,549 xinference.core.worker 124924 INFO     Checking metrics export server...
      2025-11-28 18:47:13,844 xinference.deploy.local 124803 INFO     No response from process after 10 seconds
      2025-11-28 18:47:14,489 xinference.core.worker 124924 INFO     Metrics server is started at: http://0.0.0.0:31841
      2025-11-28 18:47:14,489 xinference.core.worker 124924 INFO     Purge cache directory: /home/main_data/Xinference/cache
      2025-11-28 18:47:14,491 xinference.core.worker 124924 INFO     Connected to supervisor as a fresh worker
      2025-11-28 18:47:14,514 xinference.core.worker 124924 INFO     Xinference worker 0.0.0.0:54882 started
      2025-11-28 18:47:21,079 xinference.api.restful_api 124803 INFO     Starting Xinference at endpoint: http://0.0.0.0:9997
      2025-11-28 18:47:21,250 uvicorn.error 124803 INFO     Uvicorn running on http://0.0.0.0:9997 (Press CTRL+C to quit)
    

Xinference详细安装

  • 确认当前 Python 版本(需要 3.8 及以上)

      # python3 --version
      Python 3.10.12
    
  • 安装 Python 虚拟环境工具

      # sudo apt update
      # sudo apt install python3-venv python3-pip -y
    
  • 创建并激活虚拟环境

      # python3 -m venv /root/xinference-env
      # source /root/xinference-env/bin/activate
    
    • 激活后,命令行提示符前会显示 (xinference-env)。

        (xinference-env) root@jichenggpu:~#
      
  • 升级虚拟环境中的 pip、setuptools 和 wheel

      # pip install --upgrade pip setuptools wheel
    
  • 安装适配你 GPU 的 PyTorch

    • 查看CUDA 版本

        # nvidia-smi
        Mon Mar  2 10:22:12 2026       
        +-----------------------------------------------------------------------------------------+
        | NVIDIA-SMI 580.95.05              Driver Version: 580.95.05      CUDA Version: 13.0     |
        +-----------------------------------------+------------------------+----------------------+
        | GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
        | Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
        |                                         |                        |               MIG M. |
        |=========================================+========================+======================|
        |   0  NVIDIA GeForce RTX 3090        Off |   00000000:03:00.0  On |                  N/A |
        |  0%   44C    P8             30W /  370W |       9MiB /  24576MiB |      0%      Default |
        |                                         |                        |                  N/A |
        +-----------------------------------------+------------------------+----------------------+
        
        +-----------------------------------------------------------------------------------------+
        | Processes:                                                                              |
        |  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
        |        ID   ID                                                               Usage      |
        |=========================================================================================|
        |  No running processes found                                                             |
        +-----------------------------------------------------------------------------------------+
      
    • 安装最新稳定版CUDA 12.4

        # pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124
      
    • 验证 PyTorch 是否正确识别 GPU

        # python -c "import torch; print(torch.cuda.is_available()); print(torch.cuda.get_device_name(0))"
        True
        NVIDIA GeForce RTX 3090
        (xinference-env) root@jichenggpu:~# 
      
  • 安装 Xinference

      # pip install "xinference[transformers]"
    
  • 启动 Xinference 服务

      # xinference-local --host 0.0.0.0 --port 9997
    
  • 额外提示

    • 每次使用 Xinference 前,需要先激活虚拟环境:

        # source /root/xinference-env/bin/activate
      
    • 如果你希望 Xinference 在后台持续运行,可以使用 nohupscreen

        # nohup xinference-local --host 0.0.0.0 --port 9997 > xinference.log 2>&1 &