Skip to content

Commit d234bb0

Browse files
committed
updated claude.md
1 parent 9b3f36d commit d234bb0

File tree

1 file changed

+45
-7
lines changed

1 file changed

+45
-7
lines changed

CLAUDE.md

Lines changed: 45 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ This file provides guidance to Claude Code (claude.ai/code) when working with Li
44

55
## Project Overview
66

7-
This covers voice AI agent development with LiveKit Agents for Python. The concepts and patterns described here apply to building, extending, and improving LiveKit-based conversational AI agents.
7+
This covers multimodal AI agent development with LiveKit Agents, a realtime framework for production-grade voice, text, and vision AI agents. While this guide focuses on Python development, LiveKit also supports Node.js (beta). The concepts and patterns described here apply to building, extending, and improving LiveKit-based conversational AI agents across multiple platforms and use cases.
88

99
## Development Commands
1010

@@ -38,14 +38,18 @@ This covers voice AI agent development with LiveKit Agents for Python. The conce
3838
- **Function Tools** - Methods decorated with `@function_tool` that extend agent capabilities
3939
- **Entrypoint Function** - Sets up the voice AI pipeline with STT/LLM/TTS components
4040

41-
### Voice AI Pipeline Architecture
41+
### Multimodal AI Pipeline Architecture
4242
LiveKit agents use a modular pipeline approach with swappable components:
4343
- **STT (Speech-to-Text)**: Converts audio input to text transcripts
44-
- **LLM (Large Language Model)**: Processes conversations and generates responses
44+
- **LLM (Large Language Model)**: Processes conversations, text, and vision inputs to generate responses
4545
- **TTS (Text-to-Speech)**: Converts text responses back to synthesized speech
46+
- **Vision Processing**: Handles image and video understanding for multimodal interactions
4647
- **Turn Detection**: Determines when users finish speaking for natural conversation flow
4748
- **VAD (Voice Activity Detection)**: Detects when users are speaking vs silent
49+
- **Background Audio Handling**: Manages background audio and interruption scenarios
50+
- **Interrupt Management**: Handles conversation interruptions and context switching
4851
- **Noise Cancellation**: Optional audio enhancement (LiveKit Cloud BVC or self-hosted alternatives)
52+
- **Real-time Audio/Video Processing**: Low-latency multimedia stream handling
4953

5054
### Testing Framework Concepts
5155
LiveKit Agents provide evaluation-based testing:
@@ -117,7 +121,12 @@ All LLM providers follow consistent interfaces for easy swapping:
117121
- **Anthropic**: `anthropic.LLM(model="claude-3-haiku")` ([docs](https://docs.livekit.io/agents/integrations/llm/anthropic/))
118122
- **Google Gemini**: `google.LLM(model="gemini-1.5-flash")` ([docs](https://docs.livekit.io/agents/integrations/llm/google/))
119123
- **Azure OpenAI**: `azure_openai.LLM(model="gpt-4o")` ([docs](https://docs.livekit.io/agents/integrations/llm/azure-openai/))
120-
- **Additional Providers**: Groq, Fireworks, DeepSeek, Cerebras, Amazon Bedrock, and others
124+
- **Groq**: `groq.LLM()` ([docs](https://docs.livekit.io/agents/integrations/llm/groq/))
125+
- **Fireworks**: `fireworks.LLM()` ([docs](https://docs.livekit.io/agents/integrations/llm/fireworks/))
126+
- **DeepSeek**: `deepseek.LLM()` ([docs](https://docs.livekit.io/agents/integrations/llm/deepseek/))
127+
- **Cerebras**: `cerebras.LLM()` ([docs](https://docs.livekit.io/agents/integrations/llm/cerebras/))
128+
- **Amazon Bedrock**: `bedrock.LLM()` ([docs](https://docs.livekit.io/agents/integrations/llm/bedrock/))
129+
- **And others**: Additional providers regularly added
121130

122131
### STT Provider Options ([docs](https://docs.livekit.io/agents/integrations/stt/))
123132
All support low-latency multilingual transcription:
@@ -248,19 +257,48 @@ Complete application templates with full source code:
248257
### Custom Frontend Development Patterns
249258
- **LiveKit SDK Integration**: Use platform-specific SDKs for real-time connectivity
250259
- **Audio/Video Streaming**: Subscribe to agent tracks and transcription streams
251-
- **WebRTC Implementation**: Handle real-time communication protocols
252-
- **Enhanced UX Features**: Audio visualizers, virtual avatars, custom controls
260+
- **WebRTC Implementation**: Handle real-time communication protocols with NAT traversal
261+
- **Enhanced UX Features**:
262+
- Audio visualizers and waveform displays
263+
- Virtual avatars and character animations
264+
- Custom UI controls and interaction patterns
265+
- Real-time transcription overlays
266+
- Visual feedback for agent processing states
267+
- Interactive chat interfaces alongside voice
268+
- **Cross-Platform Support**: Consistent experience across web, mobile, and desktop
269+
270+
## Workflow Modeling and Advanced Features
271+
272+
### Workflow Modeling ([docs](https://docs.livekit.io/agents/build/workflows/))
273+
LiveKit supports sophisticated workflow modeling for complex agent behaviors:
274+
- **State Management**: Define agent states and transitions
275+
- **Conditional Logic**: Implement branching conversation flows
276+
- **Context Preservation**: Maintain conversation context across workflow steps
277+
- **Error Recovery**: Handle failures and provide graceful fallbacks
278+
- **Multi-Step Processes**: Guide users through complex tasks
279+
280+
### Background Processing Capabilities
281+
- **Background Audio Handling**: Process audio while maintaining conversation flow
282+
- **Parallel Task Execution**: Handle multiple operations simultaneously
283+
- **Context Switching**: Seamlessly transition between different conversation topics
284+
- **Asynchronous Operations**: Non-blocking external API calls and computations
253285

254286
## Advanced Integration Capabilities
255287

256288
### Telephony Integration ([docs](https://docs.livekit.io/agents/start/telephony/))
257-
Add voice calling capabilities with SIP integration for inbound/outbound phone support.
289+
Add voice calling capabilities with SIP integration:
290+
- **Inbound/Outbound Calling**: Handle phone-based interactions
291+
- **SIP Protocol Support**: Industry-standard telephony integration
292+
- **Call Management**: Handle call routing, transfers, and conferencing
293+
- **Phone Number Provisioning**: Manage virtual phone numbers for agents
258294

259295
### Production Deployment ([docs](https://docs.livekit.io/agents/ops/deployment/))
260296
- **LiveKit Cloud**: Managed hosting with enterprise features
261297
- **Self-Hosting**: Container-based deployment with provided Docker configurations
298+
- **Kubernetes Support**: Production-grade orchestration and scaling
262299
- **Scaling Strategies**: Handle multiple concurrent sessions and load balancing
263300
- **Security Configuration**: API key management and access control
301+
- **High Availability**: Multi-region deployment and failover capabilities
264302

265303
### Environment Configuration Standards
266304
Required environment variables for different provider integrations:

0 commit comments

Comments
 (0)