Clawdbot on RunPod with vLLM

Run Clawdbot with GLM-4.7 and other open-source coding models on RunPod using vLLM. Chat with your AI assistant via Telegram!

Model Comparison

Model	GPU	VRAM	Cost/hr	Context	Folder
Base (Qwen2.5-7B)	Any	16GB	$0.50	16k	`Dockerfile`
GLM-4.7-Flash FP16	H100/A100 80GB	56GB	$1.20-1.99	32k-64k	`models/glm47-flash-fp16/`
GLM-4.7-Flash AWQ 4-bit	A100 80GB	71GB	$1.19	114k	`models/glm47-flash-awq-4bit/`
GLM-4.7-REAP W4A16	B200	108GB	$5.19	32k	`models/glm47-reap-w4a16/`

Recommended: GLM-4.7-Flash AWQ 4-bit

Best value option with full 114k context window at $1.19/hr on A100 80GB.

Quick Start

1. Choose Your Model

# GLM-4.7-Flash AWQ 4-bit (Best value, A100 80GB)
IMAGE=yourusername/clawdbot-glm47-flash-awq-4bit:latest

# GLM-4.7-Flash FP16 (Full precision, H100/A100 80GB)
IMAGE=yourusername/clawdbot-glm47-flash-fp16:latest

# GLM-4.7-REAP W4A16 (High-end, B200)
IMAGE=yourusername/clawdbot-glm47-reap-w4a16:latest

# Base (Qwen2.5-7B, any GPU)
IMAGE=yourusername/clawdbot-vllm:latest

2. Create RunPod Pod

Image: Your chosen image from above
GPU: Match model requirements
Volume: 150GB at /workspace
Container Disk: 50-100GB (depending on model)
Ports: 8000/http, 18789/http, 22/tcp

3. Set Environment Variables

VLLM_API_KEY=your-secure-key           # Required
TELEGRAM_BOT_TOKEN=your-telegram-token  # Optional
GITHUB_TOKEN=ghp_xxx                    # Optional

4. Test It

# Health check
curl http://localhost:8000/health

# Chat completion
curl http://localhost:8000/v1/chat/completions \
  -H "Authorization: Bearer $VLLM_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "glm-4.7-flash",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Docker Images

Images are automatically built and pushed to Docker Hub via GitHub Actions.

Image	Description
`clawdbot-glm47-flash-awq-4bit`	GLM-4.7-Flash AWQ 4-bit for A100 80GB
`clawdbot-glm47-flash-fp16`	GLM-4.7-Flash FP16 for H100/A100 80GB
`clawdbot-glm47-reap-w4a16`	GLM-4.7-REAP W4A16 for B200
`clawdbot-vllm`	Base image with Qwen2.5-7B

Project Structure

runpod-clawdbot/
├── README.md                           # This file
├── .github/
│   └── workflows/
│       └── docker-build.yml            # Build & push to Docker Hub
│
├── models/
│   ├── glm47-flash-fp16/              # Full precision FP16 (H100/A100 80GB)
│   │   ├── README.md
│   │   ├── Dockerfile
│   │   └── entrypoint.sh
│   │
│   ├── glm47-flash-awq-4bit/          # AWQ 4-bit quantized (A100 80GB)
│   │   ├── README.md
│   │   ├── Dockerfile
│   │   └── entrypoint.sh
│   │
│   └── glm47-reap-w4a16/              # Pruned W4A16 quantized (B200)
│       ├── README.md
│       ├── Dockerfile
│       └── entrypoint.sh
│
├── scripts/
│   ├── setup-clawdbot.sh
│   └── start-vllm.sh
│
├── config/
│   ├── clawdbot.json
│   └── workspace/
│
├── templates/
│   └── clawdbot-vllm.json
│
├── tests/
│   ├── test-vllm.sh
│   └── test-tool-calling.sh
│
├── Dockerfile                          # Base image (Qwen2.5-7B)
├── docker-compose.yml
└── .env.example

GitHub Actions

Images are built automatically on:

Push to main → tagged as :latest
Push to other branches → tagged as :dev-{branch-name} (e.g., :dev-feature-xyz)
Push git tag (e.g., v1.0.0) → tagged as :v1.0.0 + :latest
Pull requests → build only, no push (validation)
Manual workflow dispatch → select specific model

Required Setup

Secrets (Repository → Settings → Secrets → Actions):

Secret	Description
`DOCKERHUB_USERNAME`	Your Docker Hub username
`DOCKERHUB_TOKEN`	Docker Hub access token (not password)

Variables (Repository → Settings → Variables → Actions):

Variable	Description
`DOCKERHUB_REPO`	(Optional) Custom repo name, defaults to username

Manual Build

# Build locally
docker build -t clawdbot-glm47-flash-awq-4bit models/glm47-flash-awq-4bit/
docker build -t clawdbot-glm47-flash-fp16 models/glm47-flash-fp16/
docker build -t clawdbot-glm47-reap-w4a16 models/glm47-reap-w4a16/

# Push to Docker Hub
docker tag clawdbot-glm47-flash-awq-4bit yourusername/clawdbot-glm47-flash-awq-4bit:latest
docker push yourusername/clawdbot-glm47-flash-awq-4bit:latest

Configuration

Environment Variables

Variable	Default	Description
`VLLM_API_KEY`	`changeme`	API key for vLLM authentication
`MODEL_NAME`	Model-specific	HuggingFace model ID
`SERVED_MODEL_NAME`	`glm-4.7-flash`	Model name in API responses
`MAX_MODEL_LEN`	Auto-detected	Maximum context length
`GPU_MEMORY_UTILIZATION`	`0.92`	GPU memory to use
`TELEGRAM_BOT_TOKEN`		Telegram bot token from @BotFather
`GITHUB_TOKEN`		GitHub PAT for git/gh operations

Clawdbot Configuration

Config is auto-generated at /workspace/.clawdbot/clawdbot.json:

{
  "models": {
    "providers": {
      "local-vllm": {
        "baseUrl": "http://localhost:8000/v1",
        "apiKey": "your-vllm-api-key",
        "api": "openai-completions"
      }
    }
  }
}

Telegram Setup

Create a bot with @BotFather
Copy the bot token
Set TELEGRAM_BOT_TOKEN environment variable
Start or restart the pod
Message your bot on Telegram!

GitHub Authentication

For git operations inside the container:

Create a GitHub Personal Access Token
Select scopes: repo, read:org, workflow
Set GITHUB_TOKEN environment variable
Token is auto-configured on startup

Testing

# Basic health check
curl http://localhost:8000/health

# List models
curl http://localhost:8000/v1/models \
  -H "Authorization: Bearer $VLLM_API_KEY"

# Tool calling test
curl http://localhost:8000/v1/chat/completions \
  -H "Authorization: Bearer $VLLM_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "glm-4.7-flash",
    "messages": [{"role": "user", "content": "What is 2+2?"}],
    "tools": [{
      "type": "function",
      "function": {
        "name": "calculate",
        "description": "Perform a calculation",
        "parameters": {
          "type": "object",
          "properties": {
            "expression": {"type": "string"}
          }
        }
      }
    }]
  }'

Troubleshooting

vLLM doesn't start

Check GPU availability: nvidia-smi
Verify VRAM is sufficient for model
Check logs: journalctl -u vllm or container logs

Model loading is slow

First load downloads model from HuggingFace (can be 18-60GB)
Use network volume to persist model across restarts
AWQ 4-bit model (18GB) loads faster than FP16 (31GB)

Tool calling not working

Verify --enable-auto-tool-choice is set
Check tool parser matches model (glm47 for GLM-4.7)
Run test script: ./tests/test-tool-calling.sh

Orphaned GPU memory

If vLLM crashes, GPU memory may stay allocated
Restart the pod to clear memory
Check with: nvidia-smi

SSH port changes

RunPod assigns random SSH ports after restart
Check port via RunPod console or API
Use RunPod web terminal as alternative

Known Issues

GGUF not supported - vLLM doesn't support GLM-4.7's GGUF format. Use AWQ.
Container disk doesn't persist - Only /workspace survives restarts.
B200 requires CUDA 13.1+ - The REAP image includes this automatically.

Cost Optimization

Use AWQ 4-bit - Same model, lower VRAM, cheaper GPU ($1.19 vs $1.99/hr)
Stop pods when idle - RunPod charges per minute
Use network volumes - Avoid re-downloading models
Consider spot instances - Up to 80% cheaper

Resources

License

MIT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Clawdbot on RunPod with vLLM

Model Comparison

Recommended: GLM-4.7-Flash AWQ 4-bit

Quick Start

1. Choose Your Model

2. Create RunPod Pod

3. Set Environment Variables

4. Test It

Docker Images

Project Structure

GitHub Actions

Required Setup

Manual Build

Configuration

Environment Variables

Clawdbot Configuration

Telegram Setup

GitHub Authentication

Testing

Troubleshooting

vLLM doesn't start

Model loading is slow

Tool calling not working

Orphaned GPU memory

SSH port changes

Known Issues

Cost Optimization

Resources

License

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.github/workflows		.github/workflows
config		config
models		models
scripts		scripts
templates		templates
tests		tests
.env.example		.env.example
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml

runpod-workers/clawdbot

Folders and files

Latest commit

History

Repository files navigation

Clawdbot on RunPod with vLLM

Model Comparison

Recommended: GLM-4.7-Flash AWQ 4-bit

Quick Start

1. Choose Your Model

2. Create RunPod Pod

3. Set Environment Variables

4. Test It

Docker Images

Project Structure

GitHub Actions

Required Setup

Manual Build

Configuration

Environment Variables

Clawdbot Configuration

Telegram Setup

GitHub Authentication

Testing

Troubleshooting

vLLM doesn't start

Model loading is slow

Tool calling not working

Orphaned GPU memory

SSH port changes

Known Issues

Cost Optimization

Resources

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages