Comprehensive examples showcasing HoloDeck's core capabilities for building, testing, and deploying AI agents through pure YAML configuration.
This directory contains 4 production-ready samples demonstrating HoloDeck's features:
| Sample | Description | Key Features |
|---|---|---|
| Ticket Routing | Classify and route support tickets | Structured output, classification, confidence scoring |
| Customer Support | Context-aware support chatbot | RAG, conversation memory, escalation workflows |
| Content Moderation | Filter user-generated content | Multi-category classification, policy enforcement |
| Legal Summarization | Summarize legal documents | Document analysis, clause extraction, risk identification |
Each sample is available for 3 LLM providers:
- OpenAI (gpt-4o) - Cloud-based, highest quality
- Azure OpenAI (gpt-4o) - Enterprise cloud with compliance
- Ollama (llama3.1:8b) - 100% local, no API keys required
HoloDeck requires Python 3.10+ and uv (the fast Python package installer).
# Install uv (if not already installed)
curl -LsSf https://astral.sh/uv/install.sh | sh
# or: brew install uv
# Install HoloDeck with ChromaDB support
uv tool install "holodeck-ai[chromadb]@latest" --prerelease allow --python 3.10
# Verify installation
holodeck --version- Docker - For running ChromaDB and Aspire Dashboard
- Node.js 18+ - For CopilotKit frontend
- LLM Provider - API keys for OpenAI/Azure, or Ollama for local
- VS Code (recommended) - Install the YAML extension for schema validation and autocomplete in agent.yaml files
All samples require ChromaDB (vector store) and Aspire Dashboard (observability):
cd samples
./start-infra.shThis starts:
- ChromaDB at http://localhost:8000
- Aspire Dashboard at http://localhost:18888 (OTLP at 4317)
cd ticket-routing/openai # or azure-openai, ollamacp .env.example .env
# Edit .env with your API keys (not needed for Ollama)holodeck serve agent.yaml --port 8001cd copilotkit
npm install
npm run devservices:
chromadb: # Vector store at http://localhost:8000
aspire-dashboard: # OpenTelemetry UI at http://localhost:18888./start-infra.sh # Start all infrastructure
./stop-infra.sh # Stop all infrastructureEach sample follows this structure:
<use-case>/<provider>/
├── agent.yaml # Agent configuration
├── config.yaml # Provider-specific config
├── .env.example # Environment template
├── README.md # Sample documentation
├── instructions/
│ └── system-prompt.md # Agent instructions
├── data/
│ └── *.json, *.md # Knowledge base data
└── copilotkit/ # Next.js frontend
├── src/app/
├── package.json
└── ...
| Feature | OpenAI | Azure OpenAI | Ollama |
|---|---|---|---|
| Model | gpt-4o | gpt-4o | llama3.1:8b |
| Embeddings | text-embedding-3-small | text-embedding-ada-002 | nomic-embed-text |
| API Key | Required | Required | None |
| Privacy | Cloud | Enterprise Cloud | 100% Local |
| Latency | Low | Low | Hardware dependent |
| Evaluation Thresholds | Standard | Standard | Slightly lower |
- Multi-provider LLM configuration
- System prompts from files
- Structured response formats
- Temperature and token limits
- VectorStore: RAG with ChromaDB for knowledge retrieval
- MCP: Model Context Protocol for external services
- Prompt: Templated prompts for structured operations
- Standard Metrics: F1, BLEU, ROUGE, METEOR
- RAG Metrics: Faithfulness, Answer Relevancy, Contextual Precision
- G-Eval: Custom LLM-as-judge evaluations
- OpenTelemetry traces, metrics, and logs
- Aspire Dashboard visualization
- Per-request tracing
holodeck test agent.yaml --verbose
holodeck test agent.yaml --output results.mdClassifies customer support tickets by category and urgency.
Response Format:
{
"category": "billing|technical|sales|account|general",
"urgency": "critical|high|medium|low",
"routing_department": "...",
"confidence_score": 0.95,
"reasoning": "..."
}Evaluations: Classification accuracy, confidence calibration
Context-aware chatbot with knowledge base and FAQ lookup.
Tools:
knowledge_base- Product documentationfaq- Frequently asked questionsmemory- Conversation persistenceescalate_to_human- Escalation templates
Evaluations: Faithfulness, helpfulness, professionalism
Classifies user content against community guidelines.
Categories: Hate speech, harassment, violence, misinformation, adult content, spam, illegal activities
Response Format:
{
"decision": "approve|warning|remove|suspend|escalate",
"violations": [...],
"reasoning": "...",
"confidence_score": 0.9
}Evaluations: Moderation accuracy, reasoning quality, consistency
Analyzes and summarizes legal documents.
Tools:
legal_templates- Standard clause definitionssample_contracts- Reference contractsfilesystem- Document accessextract_clauses- Structured extraction
Evaluations: ROUGE, BLEU, completeness, legal accuracy
For local LLM samples:
# Install Ollama
brew install ollama
# Start Ollama
ollama serve
# Pull required models
ollama pull llama3.1:8b
ollama pull nomic-embed-text:latest# Ensure uv tools are in PATH
export PATH="$HOME/.local/bin:$PATH"
# Reinstall HoloDeck
uv tool install "holodeck-ai[chromadb]@latest" --prerelease allow --python 3.10 --forcedocker logs holodeck-chromadbdocker logs holodeck-aspireEnsure ChromaDB is healthy before initializing:
curl http://localhost:8000/api/v2/heartbeatollama serve # Ensure Ollama is running
curl http://localhost:11434/api/version # Verify connection- Customize a sample - Modify prompts, add data, adjust evaluations
- Add new providers - Copy an existing variant and update configs
- Build your own - Use these samples as templates for new use cases
- Deploy - Use
holodeck deployfor production deployment
This repository includes slash commands for Claude Code to streamline agent development. These commands provide guided, conversational workflows for creating and optimizing HoloDeck agents.
| Command | Description |
|---|---|
/holodeck.create |
Conversational wizard to scaffold and configure a new agent |
/holodeck.tune |
Iteratively tune an existing agent to improve test performance |
A step-by-step wizard that guides you through creating a new HoloDeck agent:
- Basic Information - Agent name, description, use case category, LLM provider
- Model Configuration - Temperature, max tokens based on use case
- System Prompt Design - Role, guidelines, process, output format
- Response Format - JSON schema for structured output (if needed)
- Tools Configuration - Vectorstore (RAG), MCP servers
- Evaluation Metrics - RAG metrics, GEval custom criteria
- Test Cases - Comprehensive test scenarios
- Observability - OpenTelemetry configuration
- Finalization - Validation and next steps
Usage:
/holodeck.create
The wizard will ask clarifying questions at each phase before proceeding.
An iterative tuning assistant that improves agent test performance:
- Initial Assessment - Runs baseline tests, records pass/fail rates
- Diagnosis - Analyzes failures and identifies root causes
- Tuning Loop - Proposes changes, applies them, measures improvement
- Changelog Tracking - Records all modifications in
changelog.md
What it CAN modify:
- System prompt content
- Model settings (temperature, max_tokens)
- Tool configuration (top_k, chunk_size, min_similarity_score)
- Evaluation thresholds
What it CANNOT modify:
- Test cases (inputs, ground_truth, expected_tools)
- Agent name and description
Usage:
/holodeck.tune path/to/agent.yaml
The tuner will ask for maximum iterations and priority tests before starting.
The slash commands are defined in:
.claude/commands/
├── holodeck.create.md # Agent creation wizard
└── holodeck.tune.md # Agent tuning assistant
This repository also includes prompt files for GitHub Copilot in VS Code, Visual Studio, and JetBrains IDEs. These prompts provide the same guided workflows as the Claude Code commands.
| Prompt | Description |
|---|---|
holodeck-create |
Conversational wizard to scaffold and configure a new agent |
holodeck-tune |
Iteratively tune an existing agent to improve test performance |
- Open the repository in VS Code (or supported IDE) with GitHub Copilot Chat
- Type
/in the chat input to see available prompts - Select
holodeck-createorholodeck-tune - Follow the conversational wizard
Alternatively, type #prompt: and select the prompt from the dropdown.
A step-by-step wizard that guides you through creating a new HoloDeck agent:
- Basic Information - Agent name, description, use case category, LLM provider
- Model Configuration - Temperature, max tokens based on use case
- System Prompt Design - Role, guidelines, process, output format
- Response Format - JSON schema for structured output (if needed)
- Tools Configuration - Vectorstore (RAG), MCP servers
- Evaluation Metrics - RAG metrics, GEval custom criteria
- Test Cases - Comprehensive test scenarios
- Observability - OpenTelemetry configuration
- Finalization - Validation and next steps
An iterative tuning assistant that improves agent test performance:
- Initial Assessment - Analyzes baseline test results
- Diagnosis - Identifies failure patterns and root causes
- Tuning Loop - Proposes changes, guides you through applying them
- Changelog Tracking - Helps maintain
changelog.mdwith all modifications
The GitHub Copilot prompts are defined in:
.github/prompts/
├── holodeck-create.prompt.md # Agent creation wizard
└── holodeck-tune.prompt.md # Agent tuning assistant
Note: GitHub Copilot prompts provide guidance-focused workflows. Unlike Claude Code commands which can execute tools directly, Copilot prompts suggest commands for you to run manually.