A localized, containerized development environment for running the pi coding agent backed by llama.cpp. This stack lets you run local models and the agent without needing external API dependencies, keeping your code and data private. It supports both CPU and NVIDIA GPU setups via a unified interface.
- Docker
- Docker Compose
- NVIDIA Container Toolkit (if you want to use the GPU mode)
Settings are managed via the .env file. Copy the example file to get started:
cp .env.example .envYou can change the Hugging Face repo and model file in .env to try different models. By default, it downloads the highly-capable Qwen 3.5 models using the preferred UD-Q4_K_XL quantization for the best balance of speed and precision.
The environment is managed through a Makefile. The default mode is CPU. To use GPU acceleration, just append MODE=gpu to any command.
Build the images before starting:
make build
make build MODE=gpuIf you need to pull fresh base images and rebuild without cache:
make upgradeStart the llama.cpp server in the background. It will automatically download the models specified in your .env file to the local models directory on its first run.
make start
make start MODE=gpuYou can check the download progress or server status by tailing the logs:
make logsOnce the LLM backend is up and running, you can drop into the interactive agent terminal. This spins up a temporary container that attaches to your current TTY and cleans itself up when you exit.
make agent
make agent MODE=gpuTo spin down the background services:
make stopTo nuke all containers, networks, and volumes (this will not delete your downloaded models or workspace code):
make cleanVolumes are mapped to your host machine for persistence:
workspace/- Your actual codebase. Mounted inside the agent.models/- Hugging Face cache. Shared with the llama.cpp container so you don't redownload models.agent_data/- Holds the agent's history, auth, and state.
agent_data/agent/models.json
{
"providers": {
"llama-cpp": {
"baseUrl": "http://llm:8001/v1",
"api": "openai-completions",
"apiKey": "none",
"models": [
{
"id": "unsloth/Qwen3.5-4B-GGUF"
},
{
"id": "unsloth/Qwen3.5-35B-A3B-GGUF"
}
]
}
}
}
agent_data/agent/settings.json
{
"defaultProvider": "llama-cpp",
"defaultModel": "unsloth/Qwen3.5-4B-GGUF",
"lastChangelogVersion": "0.55.4"
}Brought to you by brain.fr