Skip to content

Commit a3bdea8

Browse files
committed
polish
Signed-off-by: youkaichao <[email protected]>
1 parent 2922fec commit a3bdea8

File tree

1 file changed

+17
-21
lines changed

1 file changed

+17
-21
lines changed

_posts/2025-08-15-glm45-vllm.md

Lines changed: 17 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,15 @@
11
---
22
layout: post
3-
title: "Use vLLM to deploy GLM-4.5 and GLM-4.5V model"
3+
title: "GLM-4.5 Meets vLLM: Built for Intelligent Agents"
44
author: "Yuxuan Zhang"
55
image: /assets/logos/vllm-logo-text-light.png
66
---
77

8-
# Use vLLM to deploy GLM-4.5 and GLM-4.5V model
8+
## Introduction
99

10-
## Model Introduction
10+
[General Language Model (GLM)](https://aclanthology.org/2022.acl-long.26/) is a family of foundation models created by Zhipu.ai (now renamed to [Z.ai](https://z.ai/)). The GLM team has long-term collaboration with vLLM team, dating back to the early days of vLLM and the popular [ChatGLM model series](https://github.com/zai-org/ChatGLM-6B). Recently, the GLM team released the GLM-4.5 and GLM-4.5V model series, which are designed for intelligent agents. They are the top trending models in Huggingface model hub right now.
1111

12-
The GLM-4.5 series models are foundation models designed for intelligent agents. GLM-4.5 has 355 billion total
12+
GLM-4.5 has 355 billion total
1313
parameters with 32 billion active parameters, while GLM-4.5-Air adopts a more compact design with 106 billion total
1414
parameters and 12 billion active parameters. GLM-4.5 models unify reasoning, coding, and intelligent agent capabilities
1515
to meet the complex demands of intelligent agent applications.
@@ -28,10 +28,10 @@ among models of the same scale on 42 public vision-language benchmarks.
2828

2929
![bench_45v](https://raw.githubusercontent.com/zai-org/GLM-V/refs/heads/main/resources/bench_45v.jpeg)
3030

31-
To get more information about GLM-4.5 and GLM-4.5V, please refer to the [GLM-4.5](https://github.com/zai-org/GLM-4.5)
31+
To get more information about GLM-4.5 and GLM-4.5V, please refer to [GLM-4.5](https://github.com/zai-org/GLM-4.5)
3232
and [GLM-V](https://github.com/zai-org/GLM-V).
3333

34-
this blog will guide users on how to use vLLM to accelerate inference for the GLM-4.5V and GLM-4.5 model series on
34+
This blog will guide users on how to use vLLM to accelerate inference for the GLM-4.5V and GLM-4.5 model series on
3535
NVIDIA Blackwell and Hopper GPUs.
3636

3737
## Installation
@@ -78,19 +78,19 @@ vllm serve zai-org/GLM-4.5V \
7878
`extra_body={"chat_template_kwargs": {"enable_thinking": False}}`
7979
+ If you're using 8x H100 GPUs and encounter insufficient memory when running the GLM-4.5 model, you'll need
8080
`--cpu-offload-gb 16`.
81-
+ If you encounter `flash infer` issues, use `VLLM_ATTENTION_BACKEND=XFORMERS` as a temporary replacement. You can also
82-
specify `TORCH_CUDA_ARCH_LIST='9.0+PTX'` to use `flash infer`, different GPUs have different TORCH_CUDA_ARCH_LIST
81+
+ If you encounter `flash_infer` issues, use `VLLM_ATTENTION_BACKEND=XFORMERS` as a temporary replacement. You can also
82+
specify `TORCH_CUDA_ARCH_LIST='9.0+PTX'` to use `flash_infer`, different GPUs have different TORCH_CUDA_ARCH_LIST
8383
values, please check accordingly.
84-
+ vllm v0 is not support our model.
84+
+ vLLM v0 is not support our model.
8585

8686
### Grounding in GLM-4.5V
8787

8888
GLM-4.5V equips precise grounding capabilities. Given a prompt that requests the location of a specific object, GLM-4.5V
8989
is able to reasoning step-by-step and identify the bounding boxes of the target object. The query prompt supports
90-
complex descriptions of the target object as well as specified output formats, for example:
90+
complex descriptions of the target object as well as specified output formats. Example prompts are:
9191

92-
> - Help me to locate <expr> in the image and give me its bounding boxes.
93-
> - Please pinpoint the bounding box [[x1,y1,x2,y2], …] in the image as per the given description. <expr>
92+
- Help me to locate `<expr>` in the image and give me its bounding boxes.
93+
- Please pinpoint the bounding box `[[x1,y1,x2,y2], …]` in the image as per the given description. <expr>
9494

9595
Here, `<expr>` is the description of the target object. The output bounding box is a quadruple $$[x_1,y_1,x_2,y_2]$$
9696
composed of the coordinates of the top-left and bottom-right corners, where each value is normalized by the image
@@ -100,16 +100,12 @@ In the response, the special tokens `<|begin_of_box|>` and `<|end_of_box|>` are
100100
the answer. The bracket style may vary ([], [[]], (), <>, etc.), but the meaning is the same: to enclose the coordinates
101101
of the box.
102102

103-
## Cooperation with vLLM and Z.ai Team
103+
## Cooperation with vLLM and GLM Team
104104

105-
During the release of the GLM-4.5 and GLM-4.5V models, the vLLM team worked closely with the Z.ai team, providing
106-
extensive support in addressing issues related to the model launch.
107-
The GLM-4.5 and GLM-4.5V models provided by the Z.ai team were modified in the vLLM implementation PR, including (but
108-
not limited to) resolving [CUDA Core Dump](./2025-08-11-cuda-debugging.md) debugging issues and FP8 model accuracy
109-
alignment problems.
110-
They also ensured that the vLLM `main` branch had full support for the open-source GLM-4.5 series before the models were
111-
released.
105+
Before the release of the GLM-4.5 and GLM-4.5V models, the vLLM team worked closely with the GLM team, providing
106+
extensive support in addressing issues related to the model launch, ensuring that the vLLM `main` branch had full
107+
support for the open-source GLM-4.5 series before the models were released.
112108

113109
## Acknowledgement
114110

115-
We would like to thank the vLLM team members who contributed to this effort are: Simon Mo, Kaichao You.
111+
We would like to thank the vLLM team members who contributed to this effort, including: Simon Mo, Kaichao You.

0 commit comments

Comments
 (0)