Skip to content

Commit c30a438

Browse files
committed
cursor rules
1 parent f79dda7 commit c30a438

File tree

5 files changed

+812
-0
lines changed

5 files changed

+812
-0
lines changed
Lines changed: 171 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,171 @@
1+
2+
# LiveKit Agent Workflows
3+
4+
## Agent Architecture Overview
5+
6+
LiveKit Agents implement conversational AI workflows through a structured pipeline:
7+
- **Speech-to-Text (STT)**: Convert audio input to text
8+
- **Large Language Model (LLM)**: Process conversation and generate responses
9+
- **Text-to-Speech (TTS)**: Convert text responses to audio
10+
- **Turn Detection**: Determine when user has finished speaking
11+
- **Voice Activity Detection (VAD)**: Detect speech presence
12+
13+
## Agent Implementation Patterns
14+
15+
### Core Agent Class
16+
```python
17+
from livekit.agents import Agent, RunContext, function_tool
18+
19+
class ConversationalAgent(Agent):
20+
def __init__(self):
21+
super().__init__()
22+
# Define agent behavior through instructions
23+
self.instructions = """
24+
System prompt defining:
25+
- Agent personality and role
26+
- Available capabilities
27+
- Communication style
28+
- Behavioral boundaries
29+
"""
30+
31+
@function_tool
32+
async def custom_capability(self, context: RunContext, parameter: str):
33+
"""Function tools extend agent capabilities beyond conversation.
34+
35+
Args:
36+
parameter: Clear description for LLM understanding
37+
"""
38+
# Implementation logic
39+
return "Tool result"
40+
```
41+
42+
### Agent Lifecycle & Context
43+
44+
#### RunContext Usage
45+
- **Session Access**: `context.room` for room information
46+
- **State Management**: Track conversation state across turns
47+
- **Event Handling**: Respond to room events and participant actions
48+
- **Resource Management**: Handle cleanup and resource disposal
49+
50+
#### Conversation Flow
51+
1. **Audio Reception**: Agent receives participant audio stream
52+
2. **Speech Processing**: STT converts audio to text transcript
53+
3. **LLM Processing**: Language model generates response using instructions and tools
54+
4. **Audio Generation**: TTS converts response to audio
55+
5. **Turn Management**: System detects conversation turns and manages interruptions
56+
57+
## Pipeline Configuration Patterns
58+
59+
### Session Setup
60+
```python
61+
async def entrypoint(ctx: JobContext):
62+
await ctx.connect(auto_subscribe=AutoSubscribe.AUDIO_ONLY)
63+
64+
# Configure the conversational AI pipeline
65+
session = AgentSession(
66+
stt=provider.STT(), # Speech recognition
67+
llm=provider.LLM(), # Language understanding/generation
68+
tts=provider.TTS(), # Speech synthesis
69+
turn_detector=provider.TD(), # End-of-turn detection
70+
vad=provider.VAD() # Voice activity detection
71+
)
72+
73+
# Start the agent workflow
74+
session.start(YourAgent())
75+
```
76+
77+
### Pipeline Variations
78+
79+
#### Traditional Multi-Provider Pipeline
80+
- Separate providers for each component (STT, LLM, TTS)
81+
- Maximum flexibility in provider selection
82+
- Optimized for specific use cases (latency, quality, cost)
83+
84+
#### Unified Provider Pipeline (e.g., OpenAI Realtime)
85+
- Single provider handles entire conversation flow
86+
- Reduced latency through integrated processing
87+
- Built-in voice activity detection and turn management
88+
89+
## Function Tool Patterns
90+
91+
### Tool Design Principles
92+
- **Clear Documentation**: LLM uses docstrings to understand tool purpose
93+
- **Error Handling**: Graceful failure with meaningful user feedback
94+
- **Async Implementation**: Non-blocking execution for real-time performance
95+
- **Context Awareness**: Leverage RunContext for session-specific behavior
96+
97+
### Tool Categories
98+
- **Information Retrieval**: API calls, database queries, web searches
99+
- **Actions**: External system integration, state changes
100+
- **Computation**: Data processing, calculations, transformations
101+
- **Media Processing**: Image analysis, file handling, content generation
102+
103+
## Voice Pipeline Optimization
104+
105+
### Turn Detection Strategies
106+
- **VAD-Only**: Simple voice activity detection
107+
- **Semantic Turn Detection**: Context-aware conversation boundaries
108+
- **Hybrid Approach**: VAD + semantic analysis for optimal user experience
109+
110+
### Latency Optimization
111+
- **Model Selection**: Balance capability vs. response time
112+
- **Streaming**: Real-time processing where supported
113+
- **Caching**: Reduce repeated processing overhead
114+
- **Connection Management**: Maintain persistent connections
115+
116+
## Error Handling & Resilience
117+
118+
### Common Failure Modes
119+
- **Provider Outages**: Network issues, service unavailability
120+
- **Audio Quality**: Poor input affecting transcription accuracy
121+
- **Tool Failures**: External service errors, timeout conditions
122+
- **Resource Limits**: Rate limiting, quota exhaustion
123+
124+
### Resilience Patterns
125+
- **Graceful Degradation**: Reduced functionality during partial failures
126+
- **Retry Logic**: Intelligent retry with backoff strategies
127+
- **Fallback Providers**: Alternative services for critical components
128+
- **User Communication**: Clear error messages and recovery guidance
129+
130+
## Testing Conversational Agents
131+
132+
### LLM-Based Evaluation
133+
```python
134+
# Test conversational behavior with semantic evaluation
135+
async def test_agent_response():
136+
async with AgentSession(llm=test_llm) as session:
137+
await session.start(YourAgent())
138+
result = await session.run(user_input="test scenario")
139+
140+
# Evaluate response quality using LLM judgment
141+
await result.expect.next_event().is_message(role="assistant").judge(
142+
llm=judge_llm,
143+
intent="Expected behavior description"
144+
)
145+
```
146+
147+
### Tool Testing
148+
```python
149+
# Mock external dependencies for reliable testing
150+
with mock_tools(YourAgent, {"external_api": mock_response}):
151+
# Test tool behavior under controlled conditions
152+
```
153+
154+
## Monitoring & Observability
155+
156+
### Built-in Metrics
157+
- **Performance**: Latency, throughput, error rates
158+
- **Usage**: Token consumption, API calls, session duration
159+
- **Quality**: Turn accuracy, interruption handling, user satisfaction
160+
161+
### Custom Metrics Collection
162+
```python
163+
@session.on("metrics_collected")
164+
def handle_metrics(event: MetricsCollectedEvent):
165+
# Process and forward metrics to monitoring systems
166+
custom_analytics.track(event.metrics)
167+
```
168+
169+
- STT: Audio duration, transcript time, streaming mode
170+
- LLM: Completion duration, token usage, TTFT
171+
- TTS: Audio duration, character count, generation time

0 commit comments

Comments
 (0)