Skip to content

天数 Iluvatar BI-V150显卡部署GLM-4.5-Air成功但调用失败 #5507

@01RK

Description

@01RK

设备信息

16✖天数 Iluvatar BI-V150显卡
+-----------------------------------------------------------------------------+
| IX-ML: 4.3.8 Driver Version: 4.3.0 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------|
| GPU Name | Bus-Id | Clock-SM Clock-Mem |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Iluvatar BI-V150 | 00000000:45:00.0 | 1600MHz 1600MHz |
| N/A 36C P0 N/A / N/A | 12734MiB / 32768MiB | 6% Default |
+-------------------------------+----------------------+----------------------+
| 1 Iluvatar BI-V150 | 00000000:48:00.0 | 1600MHz 1600MHz |
| N/A 34C P0 117W / 350W | 12862MiB / 32768MiB | 6% Default |
+-------------------------------+----------------------+----------------------+
| 2 Iluvatar BI-V150 | 00000000:4E:00.0 | 1600MHz 1600MHz |
| N/A 31C P0 N/A / N/A | 12606MiB / 32768MiB | 6% Default |
+-------------------------------+----------------------+----------------------+
| 3 Iluvatar BI-V150 | 00000000:51:00.0 | 1600MHz 1600MHz |
| N/A 34C P0 114W / 350W | 12990MiB / 32768MiB | 6% Default |
+-------------------------------+----------------------+----------------------+
| 4 Iluvatar BI-V150 | 00000000:5B:00.0 | 1600MHz 1600MHz |
| N/A 32C P0 N/A / N/A | 12738MiB / 32768MiB | 6% Default |
+-------------------------------+----------------------+----------------------+
| 5 Iluvatar BI-V150 | 00000000:5E:00.0 | 1600MHz 1600MHz |
| N/A 34C P0 114W / 350W | 12862MiB / 32768MiB | 6% Default |
+-------------------------------+----------------------+----------------------+
| 6 Iluvatar BI-V150 | 00000000:66:00.0 | 1600MHz 1600MHz |
| N/A 33C P0 N/A / N/A | 12606MiB / 32768MiB | 6% Default |
+-------------------------------+----------------------+----------------------+
| 7 Iluvatar BI-V150 | 00000000:69:00.0 | 1600MHz 1600MHz |
| N/A 33C P0 112W / 350W | 12990MiB / 32768MiB | 6% Default |
+-------------------------------+----------------------+----------------------+
| 8 Iluvatar BI-V150 | 00000000:73:00.0 | 1600MHz 1600MHz |
| N/A 32C P0 N/A / N/A | 12738MiB / 32768MiB | 6% Default |
+-------------------------------+----------------------+----------------------+
| 9 Iluvatar BI-V150 | 00000000:76:00.0 | 1600MHz 1600MHz |
| N/A 32C P0 115W / 350W | 12862MiB / 32768MiB | 6% Default |
+-------------------------------+----------------------+----------------------+
| 10 Iluvatar BI-V150 | 00000000:81:00.0 | 1600MHz 1600MHz |
| N/A 36C P0 N/A / N/A | 12606MiB / 32768MiB | 6% Default |
+-------------------------------+----------------------+----------------------+
| 11 Iluvatar BI-V150 | 00000000:84:00.0 | 1600MHz 1600MHz |
| N/A 36C P0 113W / 350W | 12990MiB / 32768MiB | 6% Default |
+-------------------------------+----------------------+----------------------+
| 12 Iluvatar BI-V150 | 00000000:8C:00.0 | 1600MHz 1600MHz |
| N/A 37C P0 N/A / N/A | 12738MiB / 32768MiB | 6% Default |
+-------------------------------+----------------------+----------------------+
| 13 Iluvatar BI-V150 | 00000000:8F:00.0 | 1600MHz 1600MHz |
| N/A 37C P0 116W / 350W | 12862MiB / 32768MiB | 6% Default |
+-------------------------------+----------------------+----------------------+
| 14 Iluvatar BI-V150 | 00000000:95:00.0 | 1600MHz 1600MHz |
| N/A 35C P0 N/A / N/A | 12606MiB / 32768MiB | 6% Default |
+-------------------------------+----------------------+----------------------+
| 15 Iluvatar BI-V150 | 00000000:98:00.0 | 1600MHz 1600MHz |
| N/A 36C P0 116W / 350W | 12734MiB / 32768MiB | 6% Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Process name Usage(MiB) |
|=============================================================================|
| 0 3314328 /usr/local/bin/python -u /usr/local/lib... 12658 |
| 1 3314329 /usr/local/bin/python -u /usr/local/lib... 12786 |
| 2 3314330 /usr/local/bin/python -u /usr/local/lib... 12530 |
| 3 3314331 /usr/local/bin/python -u /usr/local/lib... 12914 |
| 4 3314332 /usr/local/bin/python -u /usr/local/lib... 12658 |
| 5 3314333 /usr/local/bin/python -u /usr/local/lib... 12786 |
| 6 3314334 /usr/local/bin/python -u /usr/local/lib... 12530 |
| 7 3314337 /usr/local/bin/python -u /usr/local/lib... 12914 |
| 8 3314340 /usr/local/bin/python -u /usr/local/lib... 12658 |
| 9 3314343 /usr/local/bin/python -u /usr/local/lib... 12786 |
| 10 3314346 /usr/local/bin/python -u /usr/local/lib... 12530 |
| 11 3314349 /usr/local/bin/python -u /usr/local/lib... 12914 |
| 12 3314352 /usr/local/bin/python -u /usr/local/lib... 12658 |
| 13 3314355 /usr/local/bin/python -u /usr/local/lib... 12786 |
| 14 3314358 /usr/local/bin/python -u /usr/local/lib... 12530 |
| 15 3314363 /usr/local/bin/python -u /usr/local/lib... 12658 |
+-----------------------------------------------------------------------------+

问题

使用下列命令部署模型

export PADDLE_XCCL_BACKEND=iluvatar_gpu
export INFERENCE_MSG_QUEUE_ID=232132
export LD_PRELOAD=/usr/local/corex/lib64/libcuda.so.1
export FD_SAMPLING_CLASS=rejection
export FD_DEBUG=1
export ENABLE_V1_KVCACHE_SCHEDULER=1

python -m fastdeploy.entrypoints.openai.api_server
--model ZhipuAI/GLM-4.5-Air
--tensor-parallel-size 16
--port 8185
--block-size 16
--quantization wfp8afp8
--swap-space 50

模型看起来来成功部署👇

-swap-space 50/usr/local/lib/python3.10/site-packages/paddle/utils/cpp_extension/extension_utils.py:718: UserWarning: No ccache found. Please be aware that recompiling all source files may be required. You can download and install ccache from: https://github.com/ccache/ccache/blob/master/doc/INSTALL.md
warnings.warn(warning_message)

WARNING 2025-12-11 17:05:12,778 3314133 prometheus_multiprocess_setup.py[line:41] Found PROMETHEUS_MULTIPROC_DIR:/tmp/fd_prom_bace7b92-2e12-4d3e-9265-883cd8eef466 was set by user. you will find inaccurate metrics. Unset the variable will properly handle cleanup.
[2025-12-11 17:05:12,778] [ WARNING] prometheus_multiprocess_setup.py:41 - Found PROMETHEUS_MULTIPROC_DIR:/tmp/fd_prom_bace7b92-2e12-4d3e-9265-883cd8eef466 was set by user. you will find inaccurate metrics. Unset the variable will properly handle cleanup.
WARNING 2025-12-11 17:05:12,951 3314133 prometheus_multiprocess_setup.py[line:41] Found PROMETHEUS_MULTIPROC_DIR:/tmp/fd_prom_bace7b92-2e12-4d3e-9265-883cd8eef466 was set by user. you will find inaccurate metrics. Unset the variable will properly handle cleanup.
[2025-12-11 17:05:12,951] [ WARNING] prometheus_multiprocess_setup.py:41 - Found PROMETHEUS_MULTIPROC_DIR:/tmp/fd_prom_bace7b92-2e12-4d3e-9265-883cd8eef466 was set by user. you will find inaccurate metrics. Unset the variable will properly handle cleanup.
[2025-12-11 17:05:12,981] [ DEBUG] utils.py:35 - No plugins for group fastdeploy.reasoning_parser_plugins found.
[2025-12-11 17:05:13,139] [ DEBUG] utils.py:35 - No plugins for group fastdeploy.token_processor_plugins found.
INFO 2025-12-11 17:05:13,466 3314133 api_server.py[line:80] Number of api-server workers: 1.
/usr/local/corex-4.3.8/lib64/python3/dist-packages/torch/cuda/init.py:58: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you.
import pynvml # type: ignore[import]
Downloading Model from https://www.modelscope.cn to directory: /data/projects/modelscope/ZhipuAI/GLM-4.5-Air
2025-12-11 17:05:16,016 - modelscope - INFO - Target directory already exists, skipping creation.
/usr/local/lib/python3.10/site-packages/fastdeploy/engine/args_utils.py:65: ResourceWarning: unclosed file <_io.TextIOWrapper name='/data/projects/modelscope/ZhipuAI/GLM-4.5-Air/config.json' mode='r' encoding='utf-8'>
model_config = json.load(open(config_path, "r", encoding="utf-8"))
ResourceWarning: Enable tracemalloc to get the object allocation traceback
[2025-12-11 17:05:16,027] [ INFO] - Using download source: huggingface
[2025-12-11 17:05:16,027] [ INFO] - Loading configuration file /data/projects/modelscope/ZhipuAI/GLM-4.5-Air/config.json
[2025-12-11 17:05:16,027] [ WARNING] - You are using a model of type glm4_moe to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
/usr/local/lib/python3.10/site-packages/_distutils_hack/init.py:30: UserWarning: Setuptools is replacing distutils. Support for replacing an already imported distutils is deprecated. In the future, this condition will fail. Register concerns at https://github.com/pypa/setuptools/issues/new?template=distutils-deprecation.yml
warnings.warn(
/usr/local/lib/python3.10/site-packages/paddle/jit/sot/opcode_translator/skip_files.py:105: DeprecationWarning: The distutils package is deprecated and slated for removal in Python 3.12. Use setuptools or check PEP 632 for potential alternatives
import distutils
/usr/local/lib/python3.10/site-packages/fastdeploy/logger/logger.py:190: ResourceWarning: unclosed file <_io.BufferedWriter name='log/cudagraph_piecewise_backend.log.2025-12-11'>
for handler in logger.handlers[:]:
ResourceWarning: Enable tracemalloc to get the object allocation traceback
[2025-12-11 17:05:16,215] [ WARNING] - import noaux_tc Failed!
/usr/local/lib/python3.10/site-packages/fastdeploy/config.py:340: ResourceWarning: unclosed file <_io.TextIOWrapper name='/data/projects/modelscope/ZhipuAI/GLM-4.5-Air/config.json' mode='r' encoding='utf-8'>
self.model_config = json.load(open(config_path, "r", encoding="utf-8"))
ResourceWarning: Enable tracemalloc to get the object allocation traceback
:241: DeprecationWarning: builtin type SwigPyPacked has no module attribute
:241: DeprecationWarning: builtin type SwigPyObject has no module attribute
[2025-12-11 17:05:17,166] [ INFO] - Using download source: huggingface
[2025-12-11 17:05:17,788] [ INFO] - Using download source: huggingface
INFO 2025-12-11 17:05:18,790 3314133 engine.py[line:146] Waiting for worker processes to be ready...
Loading Weights: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:51<00:00, 1.94it/s]
Loading Layers: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:07<00:00, 14.27it/s]
INFO 2025-12-11 17:06:24,861 3314133 engine.py[line:197] Worker processes are launched with 68.54115271568298 seconds.
INFO 2025-12-11 17:06:24,861 3314133 api_server.py[line:729] Launching metrics service at http://0.0.0.0:8185/metrics
INFO 2025-12-11 17:06:24,861 3314133 api_server.py[line:730] Launching chat completion service at http://0.0.0.0:8185/v1/chat/completions
INFO 2025-12-11 17:06:24,861 3314133 api_server.py[line:731] Launching completion service at http://0.0.0.0:8185/v1/completions
[2025-12-11 17:06:25 +0800] [3314133] [INFO] Starting gunicorn 23.0.0
[2025-12-11 17:06:25 +0800] [3314133] [INFO] Listening at: http://0.0.0.0:8185 (3314133)
[2025-12-11 17:06:25 +0800] [3314133] [INFO] Using worker: uvicorn.workers.UvicornWorker
[2025-12-11 17:06:25 +0800] [4089232] [INFO] Booting worker with pid: 4089232
/usr/local/lib/python3.10/site-packages/websockets/legacy/init.py:6: DeprecationWarning: websockets.legacy is deprecated; see https://websockets.readthedocs.io/en/stable/howto/upgrade.html for upgrade instructions
warnings.warn( # deprecated in 14.0 - 2024-11-09
/usr/local/lib/python3.10/site-packages/uvicorn/protocols/websockets/websockets_impl.py:14: DeprecationWarning: websockets.server.WebSocketServerProtocol is deprecated
from websockets.server import WebSocketServerProtocol
[2025-12-11 17:06:25 +0800] [4089232] [INFO] Started server process [4089232]
[2025-12-11 17:06:25 +0800] [4089232] [INFO] Waiting for application startup.
/usr/local/lib/python3.10/site-packages/fastdeploy/engine/args_utils.py:65: ResourceWarning: unclosed file <_io.TextIOWrapper name='/data/projects/modelscope/ZhipuAI/GLM-4.5-Air/config.json' mode='r' encoding='utf-8'>
model_config = json.load(open(config_path, "r", encoding="utf-8"))
ResourceWarning: Enable tracemalloc to get the object allocation traceback
[2025-12-11 17:06:25,926] [ INFO] - Using download source: huggingface
[2025-12-11 17:06:25,926] [ INFO] - Loading configuration file /data/projects/modelscope/ZhipuAI/GLM-4.5-Air/config.json
[2025-12-11 17:06:25,926] [ WARNING] - You are using a model of type glm4_moe to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
/usr/local/lib/python3.10/site-packages/fastdeploy/config.py:340: ResourceWarning: unclosed file <_io.TextIOWrapper name='/data/projects/modelscope/ZhipuAI/GLM-4.5-Air/config.json' mode='r' encoding='utf-8'>
self.model_config = json.load(open(config_path, "r", encoding="utf-8"))
ResourceWarning: Enable tracemalloc to get the object allocation traceback
[2025-12-11 17:06:25,954] [ INFO] - Using download source: huggingface
[2025-12-11 17:06:25,955] [ INFO] - Using download source: huggingface
[2025-12-11 17:06:26 +0800] [4089232] [INFO] Application startup complete.

尝试发起调用时报错

curl -X POST "http://0.0.0.0:8185/v1/chat/completions"
-H "Content-Type: application/json"
-d '{
"messages": [
{"role": "user", "content": "什么是集成电路?"}
]
}'

得到 curl: (52) Empty reply from server
后台显示
/usr/local/lib/python3.10/site-packages/fastdeploy/engine/args_utils.py:65: ResourceWarning: unclosed file <_io.TextIOWrapper name='/data/projects/modelscope/ZhipuAI/GLM-4.5-Air/config.json' mode='r' encoding='utf-8'>
model_config = json.load(open(config_path, "r", encoding="utf-8"))
ResourceWarning: Enable tracemalloc to get the object allocation traceback
[2025-12-11 17:06:25,926] [ INFO] - Using download source: huggingface
[2025-12-11 17:06:25,926] [ INFO] - Loading configuration file /data/projects/modelscope/ZhipuAI/GLM-4.5-Air/config.json
[2025-12-11 17:06:25,926] [ WARNING] - You are using a model of type glm4_moe to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
/usr/local/lib/python3.10/site-packages/fastdeploy/config.py:340: ResourceWarning: unclosed file <_io.TextIOWrapper name='/data/projects/modelscope/ZhipuAI/GLM-4.5-Air/config.json' mode='r' encoding='utf-8'>
self.model_config = json.load(open(config_path, "r", encoding="utf-8"))
ResourceWarning: Enable tracemalloc to get the object allocation traceback
[2025-12-11 17:06:25,954] [ INFO] - Using download source: huggingface
[2025-12-11 17:06:25,955] [ INFO] - Using download source: huggingface
[2025-12-11 17:06:26 +0800] [4089232] [INFO] Application startup complete.
/usr/local/lib/python3.10/site-packages/fastdeploy/entrypoints/openai/protocol.py:692: DeprecationWarning: max_tokens is deprecated in favor of the max_completion_tokens field
req_dict["max_tokens"] = self.max_completion_tokens or self.max_tokens
/usr/local/lib/python3.10/site-packages/fastdeploy/entrypoints/openai/protocol.py:708: PydanticDeprecatedSince20: The dict method is deprecated; use model_dump instead. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.11/migration/
for key, value in self.dict().items():
[2025-12-11 17:18:13 +0800] [3314133] [ERROR] Worker (pid:3314315) exited with code 1
[2025-12-11 17:18:13 +0800] [3314133] [ERROR] Worker (pid:3314315) exited with code 1.
ERROR 2025-12-11 17:18:15,407 3314133 api_server.py[line:704] Worker process has died in the background (code=0). API server is forced to stop.
[2025-12-11 17:18:15 +0800] [3314133] [INFO] Handling signal: int
[2025-12-11 17:18:15 +0800] [3314133] [INFO] Shutting down: Master
ERROR 2025-12-11 17:18:15,412 3314133 engine.py[line:435] Error extracting sub services: [Errno 3] No such process, Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/fastdeploy/engine/engine.py", line 432, in _exit_sub_services
pgid = os.getpgid(self.worker_proc.pid)
ProcessLookupError: [Errno 3] No such process

[2025-12-11 17:18:15 +0800] [3314133] [ERROR] Worker (pid:4089232) was sent SIGKILL! Perhaps out of memory?
sys:1: DeprecationWarning: builtin type swigvarlink has no module attribute
sys:1: ResourceWarning: unclosed file <_io.BufferedReader name=80>

需要帮助

是当前天数显卡还不支持使用该模型吗?或者需要额外的环境变量/启动参数?谢谢!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions