Eval Workshop

A hands-on workshop for building and evaluating LLM-powered agents using a helpdesk routing system as the example.

What You'll Build

A helpdesk agent that:

Routes requests to the right department (IT, HR, Facilities, Finance, Legal, Security)
Answers questions using a knowledge base (HR policies)
Escalates to humans when it can't help

Along the way, you'll learn evaluation-driven development: running experiments, analyzing failures, and iterating on prompts.

Get Started

Start the Workshop (~60 minutes, guided)

Quick Start

Option A: GitHub Codespaces (recommended)

Click the button below to open a fully configured environment in your browser — no local setup required.

Once it opens, add your API key:

cp .env.example .env
# Edit .env and add your ANTHROPIC_API_KEY (or OPENAI_API_KEY)

Option B: VS Code Dev Container

Clone the repo and open in VS Code
When prompted, click "Reopen in Container"
Add your API key to .env (same as above)

Python, uv, and all dependencies are installed automatically. Cat Cafe starts on container launch.

Option C: Local Setup

# 1. Clone and install dependencies
git clone <repo-url>
cd eval-workshop-ext
uv sync

# 2. Set up environment variables
cp .env.example .env
# Edit .env and add your ANTHROPIC_API_KEY (https://console.anthropic.com/settings/keys)
# Or OPENAI_API_KEY if using OpenAI (https://platform.openai.com/api-keys)

# 3. Start local services (pull latest images first)
docker compose pull && docker compose up -d

# 4. Verify services are running
docker compose ps
# Should show cat-cafe as running/healthy

Requires Python 3.13+, uv, and Docker.

Try the Agent

# Baseline (routing only, escalation only)
uv run helpdesk-agent -c configs/baseline.yaml "My laptop won't turn on"

# With HR specialist (tuned concierge + specialist)
uv run helpdesk-agent -c configs/tuned.yaml "How many vacation days do I get?"

Project Structure

eval-workshop-ext/
├── configs/                 # Agent configurations + models.yaml
├── data/                    # Datasets (JSONL format)
├── docs/
│   └── workshop.md          # Workshop guide
├── experiments/             # Experiment definitions
├── kb/                      # Knowledge base documents
├── prompts/                 # Prompt files (versioned)
├── scripts/                 # CLI tools
└── src/
    ├── agent_platform/      # Custom agent runtime
    └── helpdesk/            # Helpdesk domain

Service URLs

Service	URL	Purpose
CAT Cafe	http://localhost:8000	Experiment tracking, traces, datasets

Prerequisites

Anthropic API key (get one here) or OpenAI API key (get one here)
One of: GitHub Codespaces (browser only), VS Code with Dev Containers extension, or local Python 3.13+ with Docker

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.devcontainer		.devcontainer
configs		configs
data		data
docs		docs
experiments		experiments
kb/hr		kb/hr
prompts		prompts
scripts		scripts
src		src
.cat-experiments.yml		.cat-experiments.yml
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Eval Workshop

What You'll Build

Get Started

Quick Start

Option A: GitHub Codespaces (recommended)

Option B: VS Code Dev Container

Option C: Local Setup

Try the Agent

Project Structure

Service URLs

Prerequisites

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Eval Workshop

What You'll Build

Get Started

Quick Start

Option A: GitHub Codespaces (recommended)

Option B: VS Code Dev Container

Option C: Local Setup

Try the Agent

Project Structure

Service URLs

Prerequisites

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages