|
| 1 | +# GLM-OCR Truss Model |
| 2 | + |
| 3 | +This is a Truss deployment of the [GLM-OCR](https://huggingface.co/zai-org/GLM-OCR) model for optical character recognition using vLLM engine on Baseten served on an L4 GPU. With only 0.9B parameters, GLM-OCR delivers strong OCR performance while being lightweight enough for high-concurrency and edge deployments. |
| 4 | + |
| 5 | +GLM-OCR integrates the CogViT visual encoder, a lightweight cross-modal connector with efficient token downsampling, and a GLM-0.5B language decoder. It supports a two-stage pipeline (layout analysis + parallel recognition) for processing complex documents. |
| 6 | + |
| 7 | +## Quick Start |
| 8 | + |
| 9 | +### 1. Deploy to Baseten |
| 10 | + |
| 11 | +```bash |
| 12 | +# Clone this repo and cd into this folder |
| 13 | +git clone https://github.com/basetenlabs/truss-examples.git |
| 14 | +cd truss-examples/glm-ocr |
| 15 | + |
| 16 | +# Deploy the model |
| 17 | +truss push --publish |
| 18 | +# This assumes you have truss installed, if not follow the instructions here: |
| 19 | +# https://docs.baseten.co/development/model/build-your-first-model |
| 20 | +``` |
| 21 | + |
| 22 | +### 2. Test with the OpenAI Client |
| 23 | + |
| 24 | +Replace the `api_key` and `base_url` in `test.py` with your specific deployment credentials and URL. |
| 25 | + |
| 26 | +```bash |
| 27 | +pip install openai |
| 28 | +python test.py |
| 29 | +``` |
| 30 | + |
| 31 | +### 3. Project Structure |
| 32 | + |
| 33 | +``` |
| 34 | +glm-ocr/ |
| 35 | +├── config.yaml # Truss configuration |
| 36 | +├── test.py # Test script (OpenAI client) |
| 37 | +└── README.md # Documentation |
| 38 | +``` |
| 39 | + |
| 40 | +## Model Information |
| 41 | + |
| 42 | +- **Model**: [zai-org/GLM-OCR](https://huggingface.co/zai-org/GLM-OCR) |
| 43 | +- **Parameters**: 0.9B |
| 44 | +- **Framework**: vLLM (OpenAI-compatible API) |
| 45 | +- **GPU**: L4 (24GB) |
| 46 | +- **API**: OpenAI Chat Completions (`/v1/chat/completions`) |
| 47 | + |
| 48 | +## Usage |
| 49 | + |
| 50 | +### Using the OpenAI Client |
| 51 | + |
| 52 | +```python |
| 53 | +from openai import OpenAI |
| 54 | + |
| 55 | +client = OpenAI( |
| 56 | + api_key="YOUR_BASETEN_API_KEY", |
| 57 | + base_url="https://model-XXXX.api.baseten.co/deployment/YYYY/sync/v1" |
| 58 | +) |
| 59 | + |
| 60 | +response = client.chat.completions.create( |
| 61 | + model="zai-org/GLM-OCR", |
| 62 | + messages=[{ |
| 63 | + "role": "user", |
| 64 | + "content": [ |
| 65 | + {"type": "image_url", "image_url": {"url": "https://example.com/document.png"}}, |
| 66 | + {"type": "text", "text": "Text Recognition:"} |
| 67 | + ] |
| 68 | + }], |
| 69 | + max_tokens=4096 |
| 70 | +) |
| 71 | + |
| 72 | +print(response.choices[0].message.content) |
| 73 | +``` |
| 74 | + |
| 75 | +The model accepts images via URL or base64-encoded data URIs, and returns recognized text in markdown format. |
0 commit comments