Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
75 changes: 75 additions & 0 deletions glm-ocr/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
# GLM-OCR Truss Model

This is a Truss deployment of the [GLM-OCR](https://huggingface.co/zai-org/GLM-OCR) model for optical character recognition using vLLM engine on Baseten served on an L4 GPU. With only 0.9B parameters, GLM-OCR delivers strong OCR performance while being lightweight enough for high-concurrency and edge deployments.

GLM-OCR integrates the CogViT visual encoder, a lightweight cross-modal connector with efficient token downsampling, and a GLM-0.5B language decoder. It supports a two-stage pipeline (layout analysis + parallel recognition) for processing complex documents.

## Quick Start

### 1. Deploy to Baseten

```bash
# Clone this repo and cd into this folder
git clone https://github.com/basetenlabs/truss-examples.git
cd truss-examples/glm-ocr

# Deploy the model
truss push --publish
# This assumes you have truss installed, if not follow the instructions here:
# https://docs.baseten.co/development/model/build-your-first-model
```

### 2. Test with the OpenAI Client

Replace the `api_key` and `base_url` in `test.py` with your specific deployment credentials and URL.

```bash
pip install openai
python test.py
```

### 3. Project Structure

```
glm-ocr/
├── config.yaml # Truss configuration
├── test.py # Test script (OpenAI client)
└── README.md # Documentation
```

## Model Information

- **Model**: [zai-org/GLM-OCR](https://huggingface.co/zai-org/GLM-OCR)
- **Parameters**: 0.9B
- **Framework**: vLLM (OpenAI-compatible API)
- **GPU**: L4 (24GB)
- **API**: OpenAI Chat Completions (`/v1/chat/completions`)

## Usage

### Using the OpenAI Client

```python
from openai import OpenAI

client = OpenAI(
api_key="YOUR_BASETEN_API_KEY",
base_url="https://model-XXXX.api.baseten.co/deployment/YYYY/sync/v1"
)

response = client.chat.completions.create(
model="zai-org/GLM-OCR",
messages=[{
"role": "user",
"content": [
{"type": "image_url", "image_url": {"url": "https://example.com/document.png"}},
{"type": "text", "text": "Text Recognition:"}
]
}],
max_tokens=4096
)

print(response.choices[0].message.content)
```

The model accepts images via URL or base64-encoded data URIs, and returns recognized text in markdown format.
2 changes: 1 addition & 1 deletion glm-ocr/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,6 @@ resources:
accelerator: L4
use_gpu: true
runtime:
predict_concurrency: 128
predict_concurrency: 32
secrets:
hf_access_token: null
Loading