|
| 1 | +# CLAUDE.md |
| 2 | + |
| 3 | +This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. |
| 4 | + |
| 5 | +## Development Commands |
| 6 | + |
| 7 | +### Environment Setup |
| 8 | +- `uv sync` - Install dependencies to virtual environment |
| 9 | +- `uv sync --dev` - Install dependencies including dev tools (pytest, ruff) |
| 10 | +- Copy `.env.example` to `.env` and configure API keys |
| 11 | +- `lk app env -w .env` - Auto-load LiveKit environment using CLI |
| 12 | + |
| 13 | +### Running the Agent |
| 14 | +- `uv run python src/agent.py download-files` - Download required models (Silero VAD, LiveKit turn detector) before first run |
| 15 | +- `uv run python src/agent.py console` - Run agent in terminal for direct interaction |
| 16 | +- `uv run python src/agent.py dev` - Run agent for frontend/telephony integration |
| 17 | +- `uv run python src/agent.py start` - Production mode |
| 18 | + |
| 19 | +### Code Quality |
| 20 | +- `uv run ruff check .` - Run linter |
| 21 | +- `uv run ruff format .` - Format code |
| 22 | +- `uv run ruff check --output-format=github .` - Lint with GitHub Actions format |
| 23 | +- `uv run ruff format --check --diff .` - Check formatting without applying changes |
| 24 | + |
| 25 | +### Testing |
| 26 | +- `uv run pytest` - Run full test suite including evaluations |
| 27 | +- `uv run pytest tests/test_agent.py::test_offers_assistance` - Run specific test |
| 28 | + |
| 29 | +## Architecture |
| 30 | + |
| 31 | +### Core Components |
| 32 | +- `src/agent.py` - Main agent implementation with `Assistant` class inheriting from `Agent` |
| 33 | +- `Assistant` class contains agent instructions and function tools (e.g., `lookup_weather`) |
| 34 | +- `entrypoint()` function sets up the voice AI pipeline with STT/LLM/TTS components |
| 35 | + |
| 36 | +### Voice AI Pipeline |
| 37 | +The agent uses a modular pipeline approach: |
| 38 | +- **STT**: Deepgram Nova-3 model with multilingual support |
| 39 | +- **LLM**: OpenAI GPT-4o-mini (easily swappable) |
| 40 | +- **TTS**: Cartesia for voice synthesis |
| 41 | +- **Turn Detection**: LiveKit's multilingual turn detection model |
| 42 | +- **VAD**: Silero VAD for voice activity detection |
| 43 | +- **Noise Cancellation**: LiveKit Cloud BVC (can be omitted for self-hosting) |
| 44 | + |
| 45 | +### Testing Framework |
| 46 | +Uses LiveKit Agents testing framework with evaluation-based tests: |
| 47 | +- Tests use `AgentSession` with real LLM interactions |
| 48 | +- `.judge()` method evaluates agent responses against intent descriptions |
| 49 | +- Mock tools available for testing error conditions |
| 50 | +- Supports both unit tests and end-to-end evaluations |
| 51 | + |
| 52 | +### Configuration |
| 53 | +- Environment variables loaded via `python-dotenv` |
| 54 | +- Required API keys: LIVEKIT_URL, LIVEKIT_API_KEY, LIVEKIT_API_SECRET, OPENAI_API_KEY, DEEPGRAM_API_KEY, CARTESIA_API_KEY |
| 55 | +- Alternative providers can be swapped by modifying the session setup in `entrypoint()` |
| 56 | + |
| 57 | +### Function Tools |
| 58 | +Functions decorated with `@function_tool` are automatically passed to the LLM: |
| 59 | +- Must be async methods on the Agent class |
| 60 | +- Include docstrings with tool descriptions and argument specifications |
| 61 | +- Example: `lookup_weather()` for weather information retrieval |
| 62 | + |
| 63 | +### Metrics and Logging |
| 64 | +- Integrated usage collection and metrics logging |
| 65 | +- Metrics collected via `MetricsCollectedEvent` handlers |
| 66 | +- Usage summaries logged on session shutdown |
| 67 | +- Room context automatically included in log entries |
| 68 | + |
| 69 | +## Key Patterns |
| 70 | + |
| 71 | +### Agent Customization |
| 72 | +To modify agent behavior: |
| 73 | +1. Update `instructions` in `Assistant.__init__()` |
| 74 | +2. Add new `@function_tool` methods for custom capabilities |
| 75 | +3. Swap STT/LLM/TTS providers in the `AgentSession` setup |
| 76 | + |
| 77 | +### Testing New Features |
| 78 | +1. Add unit tests to `tests/test_agent.py` |
| 79 | +2. Use `.judge()` evaluations for response quality |
| 80 | +3. Mock external dependencies with `mock_tools()` |
| 81 | +4. Test both success and error conditions |
| 82 | + |
| 83 | +### Deployment |
| 84 | +- Production-ready with included `Dockerfile` |
| 85 | +- Uses `uv` for dependency management |
| 86 | +- CI/CD workflows for linting (`ruff.yml`) and testing (`tests.yml`) |
| 87 | + |
| 88 | +## LiveKit Documentation & Examples |
| 89 | + |
| 90 | +The LiveKit documentation is comprehensive and provides detailed guidance for all aspects of agent development. **All documentation URLs support `.md` suffix for markdown format** and the docs follow the **llms.txt standard** for AI-friendly consumption. |
| 91 | + |
| 92 | +**Core Documentation**: https://docs.livekit.io/agents/ |
| 93 | +- **Quick Start**: https://docs.livekit.io/agents/start/voice-ai/ |
| 94 | +- **Building Agents**: https://docs.livekit.io/agents/build/ |
| 95 | +- **Integrations**: https://docs.livekit.io/agents/integrations/ |
| 96 | +- **Operations & Deployment**: https://docs.livekit.io/agents/ops/ |
| 97 | + |
| 98 | +**Practical Examples Repository**: https://github.com/livekit-examples/python-agents-examples |
| 99 | +- Contains dozens of real-world agent implementations |
| 100 | +- Advanced patterns and use cases beyond the starter template |
| 101 | +- Integration examples with various AI providers and tools |
| 102 | +- Production-ready code samples |
| 103 | + |
| 104 | +## Extending Agent Functionality |
| 105 | + |
| 106 | +### Swapping AI Providers |
| 107 | + |
| 108 | +#### LLM Providers ([docs](https://docs.livekit.io/agents/integrations/llm/)) |
| 109 | +Available providers with consistent interface: |
| 110 | +- **OpenAI**: `openai.LLM(model="gpt-4o-mini")` ([docs](https://docs.livekit.io/agents/integrations/llm/openai/)) |
| 111 | +- **Anthropic**: `anthropic.LLM(model="claude-3-haiku")` ([docs](https://docs.livekit.io/agents/integrations/llm/anthropic/)) |
| 112 | +- **Google Gemini**: `google.LLM(model="gemini-1.5-flash")` ([docs](https://docs.livekit.io/agents/integrations/llm/google/)) |
| 113 | +- **Azure OpenAI**: `azure_openai.LLM(model="gpt-4o")` ([docs](https://docs.livekit.io/agents/integrations/llm/azure-openai/)) |
| 114 | +- **Groq**: ([docs](https://docs.livekit.io/agents/integrations/llm/groq/)) |
| 115 | +- **Fireworks**: ([docs](https://docs.livekit.io/agents/integrations/llm/fireworks/)) |
| 116 | +- **DeepSeek, Cerebras, Amazon Bedrock**, and others |
| 117 | + |
| 118 | +#### STT Providers ([docs](https://docs.livekit.io/agents/integrations/stt/)) |
| 119 | +All support low-latency multilingual transcription: |
| 120 | +- **Deepgram**: `deepgram.STT(model="nova-3", language="multi")` ([docs](https://docs.livekit.io/agents/integrations/stt/deepgram/)) |
| 121 | +- **AssemblyAI**: `assemblyai.STT()` ([docs](https://docs.livekit.io/agents/integrations/stt/assemblyai/)) |
| 122 | +- **Azure AI Speech**: `azure_ai_speech.STT()` ([docs](https://docs.livekit.io/agents/integrations/stt/azure-ai-speech/)) |
| 123 | +- **Google Cloud**: `google.STT()` ([docs](https://docs.livekit.io/agents/integrations/stt/google/)) |
| 124 | +- **OpenAI**: `openai.STT()` ([docs](https://docs.livekit.io/agents/integrations/stt/openai/)) |
| 125 | + |
| 126 | +#### TTS Providers ([docs](https://docs.livekit.io/agents/integrations/tts/)) |
| 127 | +High-quality, low-latency voice synthesis: |
| 128 | +- **Cartesia**: `cartesia.TTS(model="sonic-english")` ([docs](https://docs.livekit.io/agents/integrations/tts/cartesia/)) |
| 129 | +- **ElevenLabs**: `elevenlabs.TTS()` ([docs](https://docs.livekit.io/agents/integrations/tts/elevenlabs/)) |
| 130 | +- **Azure AI Speech**: `azure_ai_speech.TTS()` ([docs](https://docs.livekit.io/agents/integrations/tts/azure-ai-speech/)) |
| 131 | +- **Amazon Polly**: `polly.TTS()` ([docs](https://docs.livekit.io/agents/integrations/tts/polly/)) |
| 132 | +- **Google Cloud**: `google.TTS()` ([docs](https://docs.livekit.io/agents/integrations/tts/google/)) |
| 133 | + |
| 134 | +### Alternative Pipeline Configurations |
| 135 | + |
| 136 | +#### OpenAI Realtime API ([docs](https://docs.livekit.io/agents/integrations/realtime/openai)) |
| 137 | +Replace entire STT-LLM-TTS pipeline with single provider: |
| 138 | +```python |
| 139 | +session = AgentSession( |
| 140 | + llm=openai.realtime.RealtimeModel( |
| 141 | + model="gpt-4o-realtime-preview", |
| 142 | + voice="alloy", |
| 143 | + temperature=0.8, |
| 144 | + ) |
| 145 | +) |
| 146 | +``` |
| 147 | +- Built-in VAD with server or semantic modes |
| 148 | +- Lower latency than traditional pipeline |
| 149 | +- Supports audio and text processing |
| 150 | + |
| 151 | +#### Custom Turn Detection |
| 152 | +**LiveKit Turn Detector** ([docs](https://docs.livekit.io/agents/build/turns/turn-detector/)): |
| 153 | +- **English Model**: `EnglishModel()` (66MB, ~15-45ms per turn) |
| 154 | +- **Multilingual Model**: `MultilingualModel()` (281MB, ~50-160ms, 14 languages) |
| 155 | +- Adds conversational context to VAD for better end-of-turn detection |
| 156 | + |
| 157 | +### Function Tools and Capabilities |
| 158 | + |
| 159 | +#### Adding Custom Tools |
| 160 | +Functions decorated with `@function_tool` become available to the LLM: |
| 161 | +```python |
| 162 | +@function_tool |
| 163 | +async def get_stock_price(self, context: RunContext, symbol: str): |
| 164 | + """Get current stock price for a symbol. |
| 165 | + |
| 166 | + Args: |
| 167 | + symbol: Stock ticker symbol (e.g., AAPL, GOOGL) |
| 168 | + """ |
| 169 | + # Implementation here |
| 170 | + return f"Stock price for {symbol}: $150.00" |
| 171 | +``` |
| 172 | + |
| 173 | +#### Tool Integration Patterns |
| 174 | +- Use `logger.info()` for debugging tool calls |
| 175 | +- Return simple strings or structured data |
| 176 | +- Handle errors gracefully with try/catch |
| 177 | +- Tools run asynchronously and can access external APIs |
| 178 | + |
| 179 | +### Testing and Evaluation ([docs](https://docs.livekit.io/agents/build/testing/)) |
| 180 | + |
| 181 | +#### Writing Agent Tests |
| 182 | +Use LiveKit's evaluation framework with LLM-based judgment: |
| 183 | +```python |
| 184 | +@pytest.mark.asyncio |
| 185 | +async def test_custom_feature(): |
| 186 | + async with AgentSession(llm=openai.LLM()) as session: |
| 187 | + await session.start(Assistant()) |
| 188 | + result = await session.run(user_input="Test query") |
| 189 | + |
| 190 | + await result.expect.next_event().is_message(role="assistant").judge( |
| 191 | + llm, intent="Expected behavior description" |
| 192 | + ) |
| 193 | +``` |
| 194 | + |
| 195 | +#### Mock Tools for Testing |
| 196 | +Test error conditions and edge cases: |
| 197 | +```python |
| 198 | +with mock_tools(Assistant, {"tool_name": lambda: "mocked_response"}): |
| 199 | + result = await session.run(user_input="test") |
| 200 | +``` |
| 201 | + |
| 202 | +#### Test Categories to Implement |
| 203 | +- **Expected Behavior**: Core functionality works correctly |
| 204 | +- **Tool Usage**: Function calls with proper arguments |
| 205 | +- **Error Handling**: Graceful failure responses |
| 206 | +- **Factual Grounding**: Accurate information, admits unknowns |
| 207 | +- **Misuse Resistance**: Refuses inappropriate requests |
| 208 | + |
| 209 | +### Metrics and Monitoring ([docs](https://docs.livekit.io/agents/build/metrics/)) |
| 210 | + |
| 211 | +#### Built-in Metrics Collection |
| 212 | +Automatic tracking of: |
| 213 | +- **STT Metrics**: Audio duration, transcript time, streaming mode |
| 214 | +- **LLM Metrics**: Completion duration, token usage, TTFT |
| 215 | +- **TTS Metrics**: Audio duration, character count, generation time |
| 216 | + |
| 217 | +#### Custom Metrics Implementation |
| 218 | +```python |
| 219 | +@session.on("metrics_collected") |
| 220 | +def _on_metrics_collected(ev: MetricsCollectedEvent): |
| 221 | + metrics.log_metrics(ev.metrics) |
| 222 | + # Add custom metric processing |
| 223 | + custom_usage_tracker.track(ev.metrics) |
| 224 | +``` |
| 225 | + |
| 226 | +#### Usage Tracking |
| 227 | +```python |
| 228 | +usage_collector = metrics.UsageCollector() |
| 229 | +# Collect throughout session |
| 230 | +summary = usage_collector.get_summary() # Get final usage stats |
| 231 | +``` |
| 232 | + |
| 233 | +### Frontend Integration ([docs](https://docs.livekit.io/agents/start/frontend/)) |
| 234 | + |
| 235 | +#### Starter App Templates |
| 236 | +Ready-to-use starter apps with full source code: |
| 237 | +- **Web (React/Next.js)**: https://github.com/livekit-examples/agent-starter-react |
| 238 | +- **iOS/macOS (Swift)**: https://github.com/livekit-examples/agent-starter-swift |
| 239 | +- **Android (Kotlin)**: https://github.com/livekit-examples/agent-starter-android |
| 240 | +- **Flutter**: https://github.com/livekit-examples/agent-starter-flutter |
| 241 | +- **React Native**: https://github.com/livekit-examples/voice-assistant-react-native |
| 242 | +- **Web Embed Widget**: https://github.com/livekit-examples/agent-starter-embed |
| 243 | + |
| 244 | +#### Custom Frontend Development |
| 245 | +- Use LiveKit SDKs (JavaScript, Swift, Android, Flutter, React Native) |
| 246 | +- Subscribe to audio/video tracks and transcription streams |
| 247 | +- Implement WebRTC for realtime connectivity |
| 248 | +- Add features like audio visualizers, virtual avatars, RPC calls |
| 249 | + |
| 250 | +### Telephony Integration ([docs](https://docs.livekit.io/agents/start/telephony/)) |
| 251 | +Add inbound or outbound calling capabilities to your agent with SIP integration. |
| 252 | + |
| 253 | +### Production Considerations |
| 254 | + |
| 255 | +#### Environment Configuration |
| 256 | +Required environment variables: |
| 257 | +- `LIVEKIT_URL`, `LIVEKIT_API_KEY`, `LIVEKIT_API_SECRET` |
| 258 | +- Provider-specific keys: `OPENAI_API_KEY`, `DEEPGRAM_API_KEY`, `CARTESIA_API_KEY` |
| 259 | + |
| 260 | +#### Deployment Options ([docs](https://docs.livekit.io/agents/ops/deployment/)) |
| 261 | +- **LiveKit Cloud**: Managed hosting with enhanced features |
| 262 | +- **Self-hosting**: Use provided `Dockerfile` |
| 263 | +- **Telephony**: SIP integration for phone calls |
| 264 | +- **Scaling**: Handle multiple concurrent sessions |
| 265 | + |
| 266 | +#### Key Files to Track in Production |
| 267 | +- Commit `uv.lock` for reproducible builds |
| 268 | +- Commit `livekit.toml` if using LiveKit Cloud |
| 269 | +- Remove template-specific CI checks |
0 commit comments