Skip to content

Latest commit

 

History

History
62 lines (51 loc) · 2.44 KB

File metadata and controls

62 lines (51 loc) · 2.44 KB

Glyph Usage Guide

Introduction

Glyph is a framework from Zhipu AI for scaling the context length through visual-text compression. It renders long textual sequences into images and processes them using vision–language models. In this guide, we demonstrate how to use vLLM to deploy the zai-org/Glyph model as a key component in this framework for image understanding tasks.

Installing vLLM

uv venv
source .venv/bin/activate
uv pip install -U vllm --torch-backend auto

Installing vLLM (For AMD ROCm: MI300x/MI325x/MI355x)

We recommend to use the official package for AMD GPUs (MI300x/MI325x/MI355x).

uv pip install vllm --extra-index-url https://wheels.vllm.ai/rocm

⚠️ The vLLM wheel for ROCm is compatible with Python 3.12, ROCm 7.0, and glibc >= 2.35. If your environment is incompatible, please use docker flow in vLLM.

Deploying Glyph

Serving Glyph Model on 1xH100 GPU

vllm serve zai-org/Glyph \
    --no-enable-prefix-caching \
    --mm-processor-cache-gb 0 \
    --reasoning-parser glm45 \
    --limit-mm-per-prompt.video 0

Serving Glyph Model on 1xMI300x/MI325x

VLLM_ROCM_USE_AITER=1 \
SAFETENSORS_FAST_GPU=1 \
vllm serve zai-org/Glyph \
    --no-enable-prefix-caching \
    --mm-processor-cache-gb 0 \
    --reasoning-parser glm45 \
    --limit-mm-per-prompt.video 0

Configuration Tips

  • zai-org/Glyph itself is a reasoning multimodal model, therefore we recommend using --reasoning-parser glm45 for parsing reasoning traces from model outputs.
  • Unlike multi-turn chat use cases, we do not expect OCR tasks to benefit significantly from prefix caching or image reuse, therefore it's recommended to turn off these features to avoid unnecessary hashing and caching.
  • Depending on your hardware capability, adjust max_num_batched_tokens for better throughput performance.
  • Check out the official Glyph documentation for more details on utilizing the vLLM deployment inside the end-to-end Glyph framework.

Run Benchmark

Open a new terminal and run the following command to execute the benchmark script:

vllm bench serve \
  --model zai-org/Glyph \
  --dataset-name random \
  --random-input-len 8192 \
  --random-output-len 512 \
  --request-rate 10000 \
  --num-prompts 16 \
  --ignore-eos