Pi Coding Agent + Llama.cpp Stack

A localized, containerized development environment for running the pi coding agent backed by llama.cpp. This stack lets you run local models and the agent without needing external API dependencies, keeping your code and data private. It supports both CPU and NVIDIA GPU setups via a unified interface.

Prerequisites

Docker
Docker Compose
NVIDIA Container Toolkit (if you want to use the GPU mode)

Configuration

Settings are managed via the .env file. Copy the example file to get started:

cp .env.example .env

You can change the Hugging Face repo and model file in .env to try different models. By default, it downloads the highly-capable Qwen 3.5 models using the preferred UD-Q4_K_XL quantization for the best balance of speed and precision.

Usage

The environment is managed through a Makefile. The default mode is CPU. To use GPU acceleration, just append MODE=gpu to any command.

Building

Build the images before starting:

make build
make build MODE=gpu

If you need to pull fresh base images and rebuild without cache:

make upgrade

Starting the backend

Start the llama.cpp server in the background. It will automatically download the models specified in your .env file to the local models directory on its first run.

make start
make start MODE=gpu

You can check the download progress or server status by tailing the logs:

make logs

Running the agent

Once the LLM backend is up and running, you can drop into the interactive agent terminal. This spins up a temporary container that attaches to your current TTY and cleans itself up when you exit.

make agent
make agent MODE=gpu

Stopping

To spin down the background services:

make stop

To nuke all containers, networks, and volumes (this will not delete your downloaded models or workspace code):

make clean

Storage

Volumes are mapped to your host machine for persistence:

workspace/ - Your actual codebase. Mounted inside the agent.
models/ - Hugging Face cache. Shared with the llama.cpp container so you don't redownload models.
agent_data/ - Holds the agent's history, auth, and state.

Agent configs

agent_data/agent/models.json

{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://llm:8001/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "unsloth/Qwen3.5-4B-GGUF"
        },
        {
          "id": "unsloth/Qwen3.5-35B-A3B-GGUF"
        }
      ]
    }
  }
}

agent_data/agent/settings.json

{
  "defaultProvider": "llama-cpp",
  "defaultModel": "unsloth/Qwen3.5-4B-GGUF",
  "lastChangelogVersion": "0.55.4"
}

Brought to you by brain.fr

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
agent_data		agent_data
models		models
workspace		workspace
.env.example		.env.example
.gitignore		.gitignore
Dockerfile.agent		Dockerfile.agent
Dockerfile.llama.cpu		Dockerfile.llama.cpu
Dockerfile.llama.gpu		Dockerfile.llama.gpu
Makefile		Makefile
README.md		README.md
docker-compose.cpu.yml		docker-compose.cpu.yml
docker-compose.gpu.yml		docker-compose.gpu.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pi Coding Agent + Llama.cpp Stack

Prerequisites

Configuration

Usage

Building

Starting the backend

Running the agent

Stopping

Storage

Agent configs

About

Uh oh!

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

Pi Coding Agent + Llama.cpp Stack

Prerequisites

Configuration

Usage

Building

Starting the backend

Running the agent

Stopping

Storage

Agent configs

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Contributors 1

Languages