天数 Iluvatar BI-V150显卡部署GLM-4.5-Air成功但调用失败

## 设备信息
16✖天数 Iluvatar BI-V150显卡
+-----------------------------------------------------------------------------+
|  IX-ML: 4.3.8       Driver Version: 4.3.0       CUDA Version: 10.2          |
|-------------------------------+----------------------+----------------------|
| GPU  Name                     | Bus-Id               | Clock-SM  Clock-Mem  |
| Fan  Temp  Perf  Pwr:Usage/Cap|      Memory-Usage    | GPU-Util  Compute M. |
|===============================+======================+======================|
| 0    Iluvatar BI-V150         | 00000000:45:00.0     | 1600MHz   1600MHz    |
| N/A  36C   P0    N/A / N/A    | 12734MiB / 32768MiB  | 6%        Default    |
+-------------------------------+----------------------+----------------------+
| 1    Iluvatar BI-V150         | 00000000:48:00.0     | 1600MHz   1600MHz    |
| N/A  34C   P0    117W / 350W  | 12862MiB / 32768MiB  | 6%        Default    |
+-------------------------------+----------------------+----------------------+
| 2    Iluvatar BI-V150         | 00000000:4E:00.0     | 1600MHz   1600MHz    |
| N/A  31C   P0    N/A / N/A    | 12606MiB / 32768MiB  | 6%        Default    |
+-------------------------------+----------------------+----------------------+
| 3    Iluvatar BI-V150         | 00000000:51:00.0     | 1600MHz   1600MHz    |
| N/A  34C   P0    114W / 350W  | 12990MiB / 32768MiB  | 6%        Default    |
+-------------------------------+----------------------+----------------------+
| 4    Iluvatar BI-V150         | 00000000:5B:00.0     | 1600MHz   1600MHz    |
| N/A  32C   P0    N/A / N/A    | 12738MiB / 32768MiB  | 6%        Default    |
+-------------------------------+----------------------+----------------------+
| 5    Iluvatar BI-V150         | 00000000:5E:00.0     | 1600MHz   1600MHz    |
| N/A  34C   P0    114W / 350W  | 12862MiB / 32768MiB  | 6%        Default    |
+-------------------------------+----------------------+----------------------+
| 6    Iluvatar BI-V150         | 00000000:66:00.0     | 1600MHz   1600MHz    |
| N/A  33C   P0    N/A / N/A    | 12606MiB / 32768MiB  | 6%        Default    |
+-------------------------------+----------------------+----------------------+
| 7    Iluvatar BI-V150         | 00000000:69:00.0     | 1600MHz   1600MHz    |
| N/A  33C   P0    112W / 350W  | 12990MiB / 32768MiB  | 6%        Default    |
+-------------------------------+----------------------+----------------------+
| 8    Iluvatar BI-V150         | 00000000:73:00.0     | 1600MHz   1600MHz    |
| N/A  32C   P0    N/A / N/A    | 12738MiB / 32768MiB  | 6%        Default    |
+-------------------------------+----------------------+----------------------+
| 9    Iluvatar BI-V150         | 00000000:76:00.0     | 1600MHz   1600MHz    |
| N/A  32C   P0    115W / 350W  | 12862MiB / 32768MiB  | 6%        Default    |
+-------------------------------+----------------------+----------------------+
| 10   Iluvatar BI-V150         | 00000000:81:00.0     | 1600MHz   1600MHz    |
| N/A  36C   P0    N/A / N/A    | 12606MiB / 32768MiB  | 6%        Default    |
+-------------------------------+----------------------+----------------------+
| 11   Iluvatar BI-V150         | 00000000:84:00.0     | 1600MHz   1600MHz    |
| N/A  36C   P0    113W / 350W  | 12990MiB / 32768MiB  | 6%        Default    |
+-------------------------------+----------------------+----------------------+
| 12   Iluvatar BI-V150         | 00000000:8C:00.0     | 1600MHz   1600MHz    |
| N/A  37C   P0    N/A / N/A    | 12738MiB / 32768MiB  | 6%        Default    |
+-------------------------------+----------------------+----------------------+
| 13   Iluvatar BI-V150         | 00000000:8F:00.0     | 1600MHz   1600MHz    |
| N/A  37C   P0    116W / 350W  | 12862MiB / 32768MiB  | 6%        Default    |
+-------------------------------+----------------------+----------------------+
| 14   Iluvatar BI-V150         | 00000000:95:00.0     | 1600MHz   1600MHz    |
| N/A  35C   P0    N/A / N/A    | 12606MiB / 32768MiB  | 6%        Default    |
+-------------------------------+----------------------+----------------------+
| 15   Iluvatar BI-V150         | 00000000:98:00.0     | 1600MHz   1600MHz    |
| N/A  36C   P0    116W / 350W  | 12734MiB / 32768MiB  | 6%        Default    |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU        PID      Process name                                Usage(MiB) |
|=============================================================================|
|    0    3314328      /usr/local/bin/python -u /usr/local/lib...  12658      |
|    1    3314329      /usr/local/bin/python -u /usr/local/lib...  12786      |
|    2    3314330      /usr/local/bin/python -u /usr/local/lib...  12530      |
|    3    3314331      /usr/local/bin/python -u /usr/local/lib...  12914      |
|    4    3314332      /usr/local/bin/python -u /usr/local/lib...  12658      |
|    5    3314333      /usr/local/bin/python -u /usr/local/lib...  12786      |
|    6    3314334      /usr/local/bin/python -u /usr/local/lib...  12530      |
|    7    3314337      /usr/local/bin/python -u /usr/local/lib...  12914      |
|    8    3314340      /usr/local/bin/python -u /usr/local/lib...  12658      |
|    9    3314343      /usr/local/bin/python -u /usr/local/lib...  12786      |
|   10    3314346      /usr/local/bin/python -u /usr/local/lib...  12530      |
|   11    3314349      /usr/local/bin/python -u /usr/local/lib...  12914      |
|   12    3314352      /usr/local/bin/python -u /usr/local/lib...  12658      |
|   13    3314355      /usr/local/bin/python -u /usr/local/lib...  12786      |
|   14    3314358      /usr/local/bin/python -u /usr/local/lib...  12530      |
|   15    3314363      /usr/local/bin/python -u /usr/local/lib...  12658      |
+-----------------------------------------------------------------------------+

## 问题
### 使用下列命令部署模型

export PADDLE_XCCL_BACKEND=iluvatar_gpu
export INFERENCE_MSG_QUEUE_ID=232132
export LD_PRELOAD=/usr/local/corex/lib64/libcuda.so.1
export FD_SAMPLING_CLASS=rejection
export FD_DEBUG=1
export ENABLE_V1_KVCACHE_SCHEDULER=1

python -m fastdeploy.entrypoints.openai.api_server \
    --model ZhipuAI/GLM-4.5-Air \
    --tensor-parallel-size 16 \
    --port 8185 \
    --block-size 16 \
    --quantization wfp8afp8
    --swap-space 50

### 模型看起来来成功部署👇
-swap-space 50/usr/local/lib/python3.10/site-packages/paddle/utils/cpp_extension/extension_utils.py:718: UserWarning: No ccache found. Please be aware that recompiling all source files may be required. You can download and install ccache from: https://github.com/ccache/ccache/blob/master/doc/INSTALL.md
  warnings.warn(warning_message)

WARNING  2025-12-11 17:05:12,778 3314133 prometheus_multiprocess_setup.py[line:41] Found PROMETHEUS_MULTIPROC_DIR:/tmp/fd_prom_bace7b92-2e12-4d3e-9265-883cd8eef466 was set by user. you will find inaccurate metrics. Unset the variable will properly handle cleanup.
[2025-12-11 17:05:12,778] [ WARNING] prometheus_multiprocess_setup.py:41 - Found PROMETHEUS_MULTIPROC_DIR:/tmp/fd_prom_bace7b92-2e12-4d3e-9265-883cd8eef466 was set by user. you will find inaccurate metrics. Unset the variable will properly handle cleanup.
WARNING  2025-12-11 17:05:12,951 3314133 prometheus_multiprocess_setup.py[line:41] Found PROMETHEUS_MULTIPROC_DIR:/tmp/fd_prom_bace7b92-2e12-4d3e-9265-883cd8eef466 was set by user. you will find inaccurate metrics. Unset the variable will properly handle cleanup.
[2025-12-11 17:05:12,951] [ WARNING] prometheus_multiprocess_setup.py:41 - Found PROMETHEUS_MULTIPROC_DIR:/tmp/fd_prom_bace7b92-2e12-4d3e-9265-883cd8eef466 was set by user. you will find inaccurate metrics. Unset the variable will properly handle cleanup.
[2025-12-11 17:05:12,981] [   DEBUG] utils.py:35 - No plugins for group fastdeploy.reasoning_parser_plugins found.
[2025-12-11 17:05:13,139] [   DEBUG] utils.py:35 - No plugins for group fastdeploy.token_processor_plugins found.
INFO     2025-12-11 17:05:13,466 3314133 api_server.py[line:80] Number of api-server workers: 1.
/usr/local/corex-4.3.8/lib64/python3/dist-packages/torch/cuda/__init__.py:58: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you.
  import pynvml  # type: ignore[import]
Downloading Model from https://www.modelscope.cn to directory: /data/projects/modelscope/ZhipuAI/GLM-4.5-Air
2025-12-11 17:05:16,016 - modelscope - INFO - Target directory already exists, skipping creation.
/usr/local/lib/python3.10/site-packages/fastdeploy/engine/args_utils.py:65: ResourceWarning: unclosed file <_io.TextIOWrapper name='/data/projects/modelscope/ZhipuAI/GLM-4.5-Air/config.json' mode='r' encoding='utf-8'>
  model_config = json.load(open(config_path, "r", encoding="utf-8"))
ResourceWarning: Enable tracemalloc to get the object allocation traceback
[2025-12-11 17:05:16,027] [    INFO] - Using download source: huggingface
[2025-12-11 17:05:16,027] [    INFO] - Loading configuration file /data/projects/modelscope/ZhipuAI/GLM-4.5-Air/config.json
[2025-12-11 17:05:16,027] [ WARNING] - You are using a model of type glm4_moe to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
/usr/local/lib/python3.10/site-packages/_distutils_hack/__init__.py:30: UserWarning: Setuptools is replacing distutils. Support for replacing an already imported distutils is deprecated. In the future, this condition will fail. Register concerns at https://github.com/pypa/setuptools/issues/new?template=distutils-deprecation.yml
  warnings.warn(
/usr/local/lib/python3.10/site-packages/paddle/jit/sot/opcode_translator/skip_files.py:105: DeprecationWarning: The distutils package is deprecated and slated for removal in Python 3.12. Use setuptools or check PEP 632 for potential alternatives
  import distutils
/usr/local/lib/python3.10/site-packages/fastdeploy/logger/logger.py:190: ResourceWarning: unclosed file <_io.BufferedWriter name='log/cudagraph_piecewise_backend.log.2025-12-11'>
  for handler in logger.handlers[:]:
ResourceWarning: Enable tracemalloc to get the object allocation traceback
[2025-12-11 17:05:16,215] [ WARNING] - import noaux_tc Failed!
/usr/local/lib/python3.10/site-packages/fastdeploy/config.py:340: ResourceWarning: unclosed file <_io.TextIOWrapper name='/data/projects/modelscope/ZhipuAI/GLM-4.5-Air/config.json' mode='r' encoding='utf-8'>
  self.model_config = json.load(open(config_path, "r", encoding="utf-8"))
ResourceWarning: Enable tracemalloc to get the object allocation traceback
<frozen importlib._bootstrap>:241: DeprecationWarning: builtin type SwigPyPacked has no __module__ attribute
<frozen importlib._bootstrap>:241: DeprecationWarning: builtin type SwigPyObject has no __module__ attribute
[2025-12-11 17:05:17,166] [    INFO] - Using download source: huggingface
[2025-12-11 17:05:17,788] [    INFO] - Using download source: huggingface
INFO     2025-12-11 17:05:18,790 3314133 engine.py[line:146] Waiting for worker processes to be ready...
Loading Weights: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:51<00:00,  1.94it/s]
Loading Layers: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:07<00:00, 14.27it/s]
INFO     2025-12-11 17:06:24,861 3314133 engine.py[line:197] Worker processes are launched with 68.54115271568298 seconds.
INFO     2025-12-11 17:06:24,861 3314133 api_server.py[line:729] Launching metrics service at http://0.0.0.0:8185/metrics
INFO     2025-12-11 17:06:24,861 3314133 api_server.py[line:730] Launching chat completion service at http://0.0.0.0:8185/v1/chat/completions
INFO     2025-12-11 17:06:24,861 3314133 api_server.py[line:731] Launching completion service at http://0.0.0.0:8185/v1/completions
[2025-12-11 17:06:25 +0800] [3314133] [INFO] Starting gunicorn 23.0.0
[2025-12-11 17:06:25 +0800] [3314133] [INFO] Listening at: http://0.0.0.0:8185 (3314133)
[2025-12-11 17:06:25 +0800] [3314133] [INFO] Using worker: uvicorn.workers.UvicornWorker
[2025-12-11 17:06:25 +0800] [4089232] [INFO] Booting worker with pid: 4089232
/usr/local/lib/python3.10/site-packages/websockets/legacy/__init__.py:6: DeprecationWarning: websockets.legacy is deprecated; see https://websockets.readthedocs.io/en/stable/howto/upgrade.html for upgrade instructions
  warnings.warn(  # deprecated in 14.0 - 2024-11-09
/usr/local/lib/python3.10/site-packages/uvicorn/protocols/websockets/websockets_impl.py:14: DeprecationWarning: websockets.server.WebSocketServerProtocol is deprecated
  from websockets.server import WebSocketServerProtocol
[2025-12-11 17:06:25 +0800] [4089232] [INFO] Started server process [4089232]
[2025-12-11 17:06:25 +0800] [4089232] [INFO] Waiting for application startup.
/usr/local/lib/python3.10/site-packages/fastdeploy/engine/args_utils.py:65: ResourceWarning: unclosed file <_io.TextIOWrapper name='/data/projects/modelscope/ZhipuAI/GLM-4.5-Air/config.json' mode='r' encoding='utf-8'>
  model_config = json.load(open(config_path, "r", encoding="utf-8"))
ResourceWarning: Enable tracemalloc to get the object allocation traceback
[2025-12-11 17:06:25,926] [    INFO] - Using download source: huggingface
[2025-12-11 17:06:25,926] [    INFO] - Loading configuration file /data/projects/modelscope/ZhipuAI/GLM-4.5-Air/config.json
[2025-12-11 17:06:25,926] [ WARNING] - You are using a model of type glm4_moe to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
/usr/local/lib/python3.10/site-packages/fastdeploy/config.py:340: ResourceWarning: unclosed file <_io.TextIOWrapper name='/data/projects/modelscope/ZhipuAI/GLM-4.5-Air/config.json' mode='r' encoding='utf-8'>
  self.model_config = json.load(open(config_path, "r", encoding="utf-8"))
ResourceWarning: Enable tracemalloc to get the object allocation traceback
[2025-12-11 17:06:25,954] [    INFO] - Using download source: huggingface
[2025-12-11 17:06:25,955] [    INFO] - Using download source: huggingface
[2025-12-11 17:06:26 +0800] [4089232] [INFO] Application startup complete.

### 尝试发起调用时报错

curl -X POST "http://0.0.0.0:8185/v1/chat/completions" \
-H "Content-Type: application/json" \
-d '{
  "messages": [
    {"role": "user", "content": "什么是集成电路？"}
  ]
}'

得到 curl: (52) Empty reply from server
后台显示
/usr/local/lib/python3.10/site-packages/fastdeploy/engine/args_utils.py:65: ResourceWarning: unclosed file <_io.TextIOWrapper name='/data/projects/modelscope/ZhipuAI/GLM-4.5-Air/config.json' mode='r' encoding='utf-8'>
  model_config = json.load(open(config_path, "r", encoding="utf-8"))
ResourceWarning: Enable tracemalloc to get the object allocation traceback
[2025-12-11 17:06:25,926] [    INFO] - Using download source: huggingface
[2025-12-11 17:06:25,926] [    INFO] - Loading configuration file /data/projects/modelscope/ZhipuAI/GLM-4.5-Air/config.json
[2025-12-11 17:06:25,926] [ WARNING] - You are using a model of type glm4_moe to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
/usr/local/lib/python3.10/site-packages/fastdeploy/config.py:340: ResourceWarning: unclosed file <_io.TextIOWrapper name='/data/projects/modelscope/ZhipuAI/GLM-4.5-Air/config.json' mode='r' encoding='utf-8'>
  self.model_config = json.load(open(config_path, "r", encoding="utf-8"))
ResourceWarning: Enable tracemalloc to get the object allocation traceback
[2025-12-11 17:06:25,954] [    INFO] - Using download source: huggingface
[2025-12-11 17:06:25,955] [    INFO] - Using download source: huggingface
[2025-12-11 17:06:26 +0800] [4089232] [INFO] Application startup complete.
/usr/local/lib/python3.10/site-packages/fastdeploy/entrypoints/openai/protocol.py:692: DeprecationWarning: max_tokens is deprecated in favor of the max_completion_tokens field
  req_dict["max_tokens"] = self.max_completion_tokens or self.max_tokens
/usr/local/lib/python3.10/site-packages/fastdeploy/entrypoints/openai/protocol.py:708: PydanticDeprecatedSince20: The `dict` method is deprecated; use `model_dump` instead. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.11/migration/
  for key, value in self.dict().items():
[2025-12-11 17:18:13 +0800] [3314133] [ERROR] Worker (pid:3314315) exited with code 1
[2025-12-11 17:18:13 +0800] [3314133] [ERROR] Worker (pid:3314315) exited with code 1.
ERROR    2025-12-11 17:18:15,407 3314133 api_server.py[line:704] Worker process has died in the background (code=0). API server is forced to stop.
[2025-12-11 17:18:15 +0800] [3314133] [INFO] Handling signal: int
[2025-12-11 17:18:15 +0800] [3314133] [INFO] Shutting down: Master
ERROR    2025-12-11 17:18:15,412 3314133 engine.py[line:435] Error extracting sub services: [Errno 3] No such process, Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/fastdeploy/engine/engine.py", line 432, in _exit_sub_services
    pgid = os.getpgid(self.worker_proc.pid)
ProcessLookupError: [Errno 3] No such process

[2025-12-11 17:18:15 +0800] [3314133] [ERROR] Worker (pid:4089232) was sent SIGKILL! Perhaps out of memory?
sys:1: DeprecationWarning: builtin type swigvarlink has no __module__ attribute
sys:1: ResourceWarning: unclosed file <_io.BufferedReader name=80>

### 需要帮助
是当前天数显卡还不支持使用该模型吗？或者需要额外的环境变量/启动参数？谢谢！

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

天数 Iluvatar BI-V150显卡部署GLM-4.5-Air成功但调用失败 #5507

设备信息

问题

使用下列命令部署模型

模型看起来来成功部署👇

尝试发起调用时报错

需要帮助

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

天数 Iluvatar BI-V150显卡部署GLM-4.5-Air成功但调用失败 #5507

Description

设备信息

问题

使用下列命令部署模型

模型看起来来成功部署👇

尝试发起调用时报错

需要帮助

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions