Skip to content

Commit f79dda7

Browse files
committed
Add first claude.md version
1 parent 279d1f1 commit f79dda7

File tree

3 files changed

+3221
-0
lines changed

3 files changed

+3221
-0
lines changed

CLAUDE.md

Lines changed: 269 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,269 @@
1+
# CLAUDE.md
2+
3+
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
4+
5+
## Development Commands
6+
7+
### Environment Setup
8+
- `uv sync` - Install dependencies to virtual environment
9+
- `uv sync --dev` - Install dependencies including dev tools (pytest, ruff)
10+
- Copy `.env.example` to `.env` and configure API keys
11+
- `lk app env -w .env` - Auto-load LiveKit environment using CLI
12+
13+
### Running the Agent
14+
- `uv run python src/agent.py download-files` - Download required models (Silero VAD, LiveKit turn detector) before first run
15+
- `uv run python src/agent.py console` - Run agent in terminal for direct interaction
16+
- `uv run python src/agent.py dev` - Run agent for frontend/telephony integration
17+
- `uv run python src/agent.py start` - Production mode
18+
19+
### Code Quality
20+
- `uv run ruff check .` - Run linter
21+
- `uv run ruff format .` - Format code
22+
- `uv run ruff check --output-format=github .` - Lint with GitHub Actions format
23+
- `uv run ruff format --check --diff .` - Check formatting without applying changes
24+
25+
### Testing
26+
- `uv run pytest` - Run full test suite including evaluations
27+
- `uv run pytest tests/test_agent.py::test_offers_assistance` - Run specific test
28+
29+
## Architecture
30+
31+
### Core Components
32+
- `src/agent.py` - Main agent implementation with `Assistant` class inheriting from `Agent`
33+
- `Assistant` class contains agent instructions and function tools (e.g., `lookup_weather`)
34+
- `entrypoint()` function sets up the voice AI pipeline with STT/LLM/TTS components
35+
36+
### Voice AI Pipeline
37+
The agent uses a modular pipeline approach:
38+
- **STT**: Deepgram Nova-3 model with multilingual support
39+
- **LLM**: OpenAI GPT-4o-mini (easily swappable)
40+
- **TTS**: Cartesia for voice synthesis
41+
- **Turn Detection**: LiveKit's multilingual turn detection model
42+
- **VAD**: Silero VAD for voice activity detection
43+
- **Noise Cancellation**: LiveKit Cloud BVC (can be omitted for self-hosting)
44+
45+
### Testing Framework
46+
Uses LiveKit Agents testing framework with evaluation-based tests:
47+
- Tests use `AgentSession` with real LLM interactions
48+
- `.judge()` method evaluates agent responses against intent descriptions
49+
- Mock tools available for testing error conditions
50+
- Supports both unit tests and end-to-end evaluations
51+
52+
### Configuration
53+
- Environment variables loaded via `python-dotenv`
54+
- Required API keys: LIVEKIT_URL, LIVEKIT_API_KEY, LIVEKIT_API_SECRET, OPENAI_API_KEY, DEEPGRAM_API_KEY, CARTESIA_API_KEY
55+
- Alternative providers can be swapped by modifying the session setup in `entrypoint()`
56+
57+
### Function Tools
58+
Functions decorated with `@function_tool` are automatically passed to the LLM:
59+
- Must be async methods on the Agent class
60+
- Include docstrings with tool descriptions and argument specifications
61+
- Example: `lookup_weather()` for weather information retrieval
62+
63+
### Metrics and Logging
64+
- Integrated usage collection and metrics logging
65+
- Metrics collected via `MetricsCollectedEvent` handlers
66+
- Usage summaries logged on session shutdown
67+
- Room context automatically included in log entries
68+
69+
## Key Patterns
70+
71+
### Agent Customization
72+
To modify agent behavior:
73+
1. Update `instructions` in `Assistant.__init__()`
74+
2. Add new `@function_tool` methods for custom capabilities
75+
3. Swap STT/LLM/TTS providers in the `AgentSession` setup
76+
77+
### Testing New Features
78+
1. Add unit tests to `tests/test_agent.py`
79+
2. Use `.judge()` evaluations for response quality
80+
3. Mock external dependencies with `mock_tools()`
81+
4. Test both success and error conditions
82+
83+
### Deployment
84+
- Production-ready with included `Dockerfile`
85+
- Uses `uv` for dependency management
86+
- CI/CD workflows for linting (`ruff.yml`) and testing (`tests.yml`)
87+
88+
## LiveKit Documentation & Examples
89+
90+
The LiveKit documentation is comprehensive and provides detailed guidance for all aspects of agent development. **All documentation URLs support `.md` suffix for markdown format** and the docs follow the **llms.txt standard** for AI-friendly consumption.
91+
92+
**Core Documentation**: https://docs.livekit.io/agents/
93+
- **Quick Start**: https://docs.livekit.io/agents/start/voice-ai/
94+
- **Building Agents**: https://docs.livekit.io/agents/build/
95+
- **Integrations**: https://docs.livekit.io/agents/integrations/
96+
- **Operations & Deployment**: https://docs.livekit.io/agents/ops/
97+
98+
**Practical Examples Repository**: https://github.com/livekit-examples/python-agents-examples
99+
- Contains dozens of real-world agent implementations
100+
- Advanced patterns and use cases beyond the starter template
101+
- Integration examples with various AI providers and tools
102+
- Production-ready code samples
103+
104+
## Extending Agent Functionality
105+
106+
### Swapping AI Providers
107+
108+
#### LLM Providers ([docs](https://docs.livekit.io/agents/integrations/llm/))
109+
Available providers with consistent interface:
110+
- **OpenAI**: `openai.LLM(model="gpt-4o-mini")` ([docs](https://docs.livekit.io/agents/integrations/llm/openai/))
111+
- **Anthropic**: `anthropic.LLM(model="claude-3-haiku")` ([docs](https://docs.livekit.io/agents/integrations/llm/anthropic/))
112+
- **Google Gemini**: `google.LLM(model="gemini-1.5-flash")` ([docs](https://docs.livekit.io/agents/integrations/llm/google/))
113+
- **Azure OpenAI**: `azure_openai.LLM(model="gpt-4o")` ([docs](https://docs.livekit.io/agents/integrations/llm/azure-openai/))
114+
- **Groq**: ([docs](https://docs.livekit.io/agents/integrations/llm/groq/))
115+
- **Fireworks**: ([docs](https://docs.livekit.io/agents/integrations/llm/fireworks/))
116+
- **DeepSeek, Cerebras, Amazon Bedrock**, and others
117+
118+
#### STT Providers ([docs](https://docs.livekit.io/agents/integrations/stt/))
119+
All support low-latency multilingual transcription:
120+
- **Deepgram**: `deepgram.STT(model="nova-3", language="multi")` ([docs](https://docs.livekit.io/agents/integrations/stt/deepgram/))
121+
- **AssemblyAI**: `assemblyai.STT()` ([docs](https://docs.livekit.io/agents/integrations/stt/assemblyai/))
122+
- **Azure AI Speech**: `azure_ai_speech.STT()` ([docs](https://docs.livekit.io/agents/integrations/stt/azure-ai-speech/))
123+
- **Google Cloud**: `google.STT()` ([docs](https://docs.livekit.io/agents/integrations/stt/google/))
124+
- **OpenAI**: `openai.STT()` ([docs](https://docs.livekit.io/agents/integrations/stt/openai/))
125+
126+
#### TTS Providers ([docs](https://docs.livekit.io/agents/integrations/tts/))
127+
High-quality, low-latency voice synthesis:
128+
- **Cartesia**: `cartesia.TTS(model="sonic-english")` ([docs](https://docs.livekit.io/agents/integrations/tts/cartesia/))
129+
- **ElevenLabs**: `elevenlabs.TTS()` ([docs](https://docs.livekit.io/agents/integrations/tts/elevenlabs/))
130+
- **Azure AI Speech**: `azure_ai_speech.TTS()` ([docs](https://docs.livekit.io/agents/integrations/tts/azure-ai-speech/))
131+
- **Amazon Polly**: `polly.TTS()` ([docs](https://docs.livekit.io/agents/integrations/tts/polly/))
132+
- **Google Cloud**: `google.TTS()` ([docs](https://docs.livekit.io/agents/integrations/tts/google/))
133+
134+
### Alternative Pipeline Configurations
135+
136+
#### OpenAI Realtime API ([docs](https://docs.livekit.io/agents/integrations/realtime/openai))
137+
Replace entire STT-LLM-TTS pipeline with single provider:
138+
```python
139+
session = AgentSession(
140+
llm=openai.realtime.RealtimeModel(
141+
model="gpt-4o-realtime-preview",
142+
voice="alloy",
143+
temperature=0.8,
144+
)
145+
)
146+
```
147+
- Built-in VAD with server or semantic modes
148+
- Lower latency than traditional pipeline
149+
- Supports audio and text processing
150+
151+
#### Custom Turn Detection
152+
**LiveKit Turn Detector** ([docs](https://docs.livekit.io/agents/build/turns/turn-detector/)):
153+
- **English Model**: `EnglishModel()` (66MB, ~15-45ms per turn)
154+
- **Multilingual Model**: `MultilingualModel()` (281MB, ~50-160ms, 14 languages)
155+
- Adds conversational context to VAD for better end-of-turn detection
156+
157+
### Function Tools and Capabilities
158+
159+
#### Adding Custom Tools
160+
Functions decorated with `@function_tool` become available to the LLM:
161+
```python
162+
@function_tool
163+
async def get_stock_price(self, context: RunContext, symbol: str):
164+
"""Get current stock price for a symbol.
165+
166+
Args:
167+
symbol: Stock ticker symbol (e.g., AAPL, GOOGL)
168+
"""
169+
# Implementation here
170+
return f"Stock price for {symbol}: $150.00"
171+
```
172+
173+
#### Tool Integration Patterns
174+
- Use `logger.info()` for debugging tool calls
175+
- Return simple strings or structured data
176+
- Handle errors gracefully with try/catch
177+
- Tools run asynchronously and can access external APIs
178+
179+
### Testing and Evaluation ([docs](https://docs.livekit.io/agents/build/testing/))
180+
181+
#### Writing Agent Tests
182+
Use LiveKit's evaluation framework with LLM-based judgment:
183+
```python
184+
@pytest.mark.asyncio
185+
async def test_custom_feature():
186+
async with AgentSession(llm=openai.LLM()) as session:
187+
await session.start(Assistant())
188+
result = await session.run(user_input="Test query")
189+
190+
await result.expect.next_event().is_message(role="assistant").judge(
191+
llm, intent="Expected behavior description"
192+
)
193+
```
194+
195+
#### Mock Tools for Testing
196+
Test error conditions and edge cases:
197+
```python
198+
with mock_tools(Assistant, {"tool_name": lambda: "mocked_response"}):
199+
result = await session.run(user_input="test")
200+
```
201+
202+
#### Test Categories to Implement
203+
- **Expected Behavior**: Core functionality works correctly
204+
- **Tool Usage**: Function calls with proper arguments
205+
- **Error Handling**: Graceful failure responses
206+
- **Factual Grounding**: Accurate information, admits unknowns
207+
- **Misuse Resistance**: Refuses inappropriate requests
208+
209+
### Metrics and Monitoring ([docs](https://docs.livekit.io/agents/build/metrics/))
210+
211+
#### Built-in Metrics Collection
212+
Automatic tracking of:
213+
- **STT Metrics**: Audio duration, transcript time, streaming mode
214+
- **LLM Metrics**: Completion duration, token usage, TTFT
215+
- **TTS Metrics**: Audio duration, character count, generation time
216+
217+
#### Custom Metrics Implementation
218+
```python
219+
@session.on("metrics_collected")
220+
def _on_metrics_collected(ev: MetricsCollectedEvent):
221+
metrics.log_metrics(ev.metrics)
222+
# Add custom metric processing
223+
custom_usage_tracker.track(ev.metrics)
224+
```
225+
226+
#### Usage Tracking
227+
```python
228+
usage_collector = metrics.UsageCollector()
229+
# Collect throughout session
230+
summary = usage_collector.get_summary() # Get final usage stats
231+
```
232+
233+
### Frontend Integration ([docs](https://docs.livekit.io/agents/start/frontend/))
234+
235+
#### Starter App Templates
236+
Ready-to-use starter apps with full source code:
237+
- **Web (React/Next.js)**: https://github.com/livekit-examples/agent-starter-react
238+
- **iOS/macOS (Swift)**: https://github.com/livekit-examples/agent-starter-swift
239+
- **Android (Kotlin)**: https://github.com/livekit-examples/agent-starter-android
240+
- **Flutter**: https://github.com/livekit-examples/agent-starter-flutter
241+
- **React Native**: https://github.com/livekit-examples/voice-assistant-react-native
242+
- **Web Embed Widget**: https://github.com/livekit-examples/agent-starter-embed
243+
244+
#### Custom Frontend Development
245+
- Use LiveKit SDKs (JavaScript, Swift, Android, Flutter, React Native)
246+
- Subscribe to audio/video tracks and transcription streams
247+
- Implement WebRTC for realtime connectivity
248+
- Add features like audio visualizers, virtual avatars, RPC calls
249+
250+
### Telephony Integration ([docs](https://docs.livekit.io/agents/start/telephony/))
251+
Add inbound or outbound calling capabilities to your agent with SIP integration.
252+
253+
### Production Considerations
254+
255+
#### Environment Configuration
256+
Required environment variables:
257+
- `LIVEKIT_URL`, `LIVEKIT_API_KEY`, `LIVEKIT_API_SECRET`
258+
- Provider-specific keys: `OPENAI_API_KEY`, `DEEPGRAM_API_KEY`, `CARTESIA_API_KEY`
259+
260+
#### Deployment Options ([docs](https://docs.livekit.io/agents/ops/deployment/))
261+
- **LiveKit Cloud**: Managed hosting with enhanced features
262+
- **Self-hosting**: Use provided `Dockerfile`
263+
- **Telephony**: SIP integration for phone calls
264+
- **Scaling**: Handle multiple concurrent sessions
265+
266+
#### Key Files to Track in Production
267+
- Commit `uv.lock` for reproducible builds
268+
- Commit `livekit.toml` if using LiveKit Cloud
269+
- Remove template-specific CI checks

livekit.toml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
[project]
2+
subdomain = "agent-starter-python-tt3tc1cm"
3+
4+
[agent]
5+
id = "CA_UL2R2ToWshYg"

0 commit comments

Comments
 (0)