A hands-on workshop for building and evaluating LLM-powered agents using a helpdesk routing system as the example.
A helpdesk agent that:
- Routes requests to the right department (IT, HR, Facilities, Finance, Legal, Security)
- Answers questions using a knowledge base (HR policies)
- Escalates to humans when it can't help
Along the way, you'll learn evaluation-driven development: running experiments, analyzing failures, and iterating on prompts.
Start the Workshop (~60 minutes, guided)
Click the button below to open a fully configured environment in your browser — no local setup required.
Once it opens, add your API key:
cp .env.example .env
# Edit .env and add your ANTHROPIC_API_KEY (or OPENAI_API_KEY)- Clone the repo and open in VS Code
- When prompted, click "Reopen in Container"
- Add your API key to
.env(same as above)
Python, uv, and all dependencies are installed automatically. Cat Cafe starts on container launch.
# 1. Clone and install dependencies
git clone <repo-url>
cd eval-workshop-ext
uv sync
# 2. Set up environment variables
cp .env.example .env
# Edit .env and add your ANTHROPIC_API_KEY (https://console.anthropic.com/settings/keys)
# Or OPENAI_API_KEY if using OpenAI (https://platform.openai.com/api-keys)
# 3. Start local services (pull latest images first)
docker compose pull && docker compose up -d
# 4. Verify services are running
docker compose ps
# Should show cat-cafe as running/healthyRequires Python 3.13+, uv, and Docker.
# Baseline (routing only, escalation only)
uv run helpdesk-agent -c configs/baseline.yaml "My laptop won't turn on"
# With HR specialist (tuned concierge + specialist)
uv run helpdesk-agent -c configs/tuned.yaml "How many vacation days do I get?"eval-workshop-ext/
├── configs/ # Agent configurations + models.yaml
├── data/ # Datasets (JSONL format)
├── docs/
│ └── workshop.md # Workshop guide
├── experiments/ # Experiment definitions
├── kb/ # Knowledge base documents
├── prompts/ # Prompt files (versioned)
├── scripts/ # CLI tools
└── src/
├── agent_platform/ # Custom agent runtime
└── helpdesk/ # Helpdesk domain
| Service | URL | Purpose |
|---|---|---|
| CAT Cafe | http://localhost:8000 | Experiment tracking, traces, datasets |
- Anthropic API key (get one here) or OpenAI API key (get one here)
- One of: GitHub Codespaces (browser only), VS Code with Dev Containers extension, or local Python 3.13+ with Docker