This document provides detailed information about running and configuring local models with Kon.
| Model | Quantization | Context Length | TPS | System Specs |
|---|---|---|---|---|
qwen/qwen3-coder-next |
Q4_K_M | 64,000 | N/A | i7-14700F × 28, 64GB RAM, 24GB VRAM (RTX 3090) |
zai-org/glm-4.7-flash |
Q4_K_M | 64,000 | ~80-90 | i7-14700F × 28, 64GB RAM, 24GB VRAM (RTX 3090) |
Run a local model using llama-server with the following command:
./llama-server -m <models-dir>/GLM-4.7-Flash-GGUF/GLM-4.7-Flash-Q4_K_M.gguf -n 8192 -c 64000Then start kon:
kon --model zai-org/glm-4.7-flash --provider openai --base-url http://localhost:8080/v1 --api-key ""Note
I was not able to run qwen-coder-next reliably on my system. Either the provider config had some issues or it's too big for my system (i'm not sure)