Skip to content

Latest commit

 

History

History
25 lines (17 loc) · 932 Bytes

File metadata and controls

25 lines (17 loc) · 932 Bytes

Local Models

This document provides detailed information about running and configuring local models with Kon.

Tested Models

Model Quantization Context Length TPS System Specs
qwen/qwen3-coder-next Q4_K_M 64,000 N/A i7-14700F × 28, 64GB RAM, 24GB VRAM (RTX 3090)
zai-org/glm-4.7-flash Q4_K_M 64,000 ~80-90 i7-14700F × 28, 64GB RAM, 24GB VRAM (RTX 3090)

Run a local model using llama-server with the following command:

./llama-server -m <models-dir>/GLM-4.7-Flash-GGUF/GLM-4.7-Flash-Q4_K_M.gguf -n 8192 -c 64000

Then start kon:

kon --model zai-org/glm-4.7-flash --provider openai --base-url http://localhost:8080/v1 --api-key ""

Note

I was not able to run qwen-coder-next reliably on my system. Either the provider config had some issues or it's too big for my system (i'm not sure)