Skip to content

Latest commit

Β 

History

History
422 lines (326 loc) Β· 13.3 KB

File metadata and controls

422 lines (326 loc) Β· 13.3 KB

LiveKit Coffee Barista Voice Agent

This is a production-ready Python voice agent built using the LiveKit Agents Framework with OpenAI Realtime API integration. The agent acts as a helpful coffee barista robot at a blockchain conference, supporting wake word detection for hands-free interaction.

🎯 Features

  • πŸŽ™οΈ MultimodalAgent Architecture: Built with LiveKit's advanced MultimodalAgent framework
  • πŸ”Š Smart Wake Word Detection: "Hey Barista" activation with intelligent conversation management
  • πŸ€– OpenAI Realtime API: Ultra-low latency voice-to-voice interaction (~200ms)
  • ⚑ Thread-Safe State Management: Robust multi-threaded wake word detection
  • πŸ”„ Smart Timer Management: Automatic conversation timeout with user engagement detection
  • πŸ›‘οΈ Duplicate Protection: Prevents multiple wake word activations during conversation
  • β˜• Coffee Barista Theme: Specialized for coffee ordering and blockchain conference context
  • πŸ› οΈ Function Tools: Built-in time/date functions with easy extensibility
  • πŸ“ž Multi-mode Support: Terminal, development, and production modes
  • 🎡 Audio Processing: Advanced VAD, turn detection, and conversation flow

πŸš€ Quick Start

1. Install Dependencies

pip install -r requirements.txt

2. Configure Environment Variables

Create a .env file in the project root:

# Required: OpenAI API Configuration
OPENAI_API_KEY=your_openai_api_key_here

# Optional: Wake Word Detection
PORCUPINE_ACCESS_KEY=your_porcupine_access_key_here

# Optional: LiveKit Cloud (recommended for production)
LIVEKIT_API_KEY=your_livekit_api_key_here
LIVEKIT_API_SECRET=your_livekit_api_secret_here
LIVEKIT_URL=wss://your-livekit-server.com

# Agent Configuration
VOICE_AGENT_TEMPERATURE=0.7  # AI response creativity (0.0-1.0)
VOICE_AGENT_VOICE=nova      # OpenAI voice: alloy, echo, fable, onyx, nova, shimmer

3. Get API Keys

πŸ”‘ OpenAI API Key (Required)

  1. Sign up at OpenAI Platform
  2. Create an API key
  3. Add to your .env file

πŸŽ™οΈ Porcupine Access Key (Optional - for wake words)

  1. Sign up at Picovoice Console
  2. Create a project and get your Access Key
  3. Add to your .env file

☁️ LiveKit Credentials (Optional - for cloud features)

  1. Sign up at LiveKit Cloud
  2. Create a project
  3. Copy API Key and Secret

4. Download Model Files

python livekit.py download-files

5. Run the Agent

# Terminal mode (local testing)
python livekit.py console

# Development mode (connect to LiveKit)
python livekit.py dev

# Production mode
python livekit.py start

πŸ—οΈ Architecture

The coffee barista agent uses LiveKit's advanced MultimodalAgent framework with thread-safe wake word detection:

User Voice β†’ LiveKit Room β†’ MultimodalAgent Session β†’ OpenAI Realtime API β†’ Voice Response
                              ↑
                    Wake Word Detection Thread

🎯 Operational Modes

Wake Word Mode (when PORCUPINE_ACCESS_KEY is set):

  • Continuously monitors for "hey barista"
  • Activates conversation on wake word detection
  • Intelligent conversation state management
  • Automatic wake word pausing during conversation
  • Smart timer-based conversation timeout

Always-On Mode (when no wake word key):

  • Immediately ready for conversation
  • Greets user as coffee barista
  • Higher engagement but more power usage

πŸš€ Key Technical Improvements

Thread-Safe Wake Word Detection:

  • Uses asyncio.run_coroutine_threadsafe() for thread safety
  • Prevents race conditions between wake word and main threads
  • Duplicate activation protection

Smart Timer Management:

  • Single timer tracking (prevents multiple concurrent timers)
  • Automatic cancellation when user speaks
  • Intelligent restart after agent responses
  • Proper cleanup on conversation end

Conversation State Management:

  • Pauses wake word detection during active conversation
  • Prevents multiple simultaneous activations
  • Graceful conversation ending with timer-based timeout

πŸ“– Usage Examples

Terminal Mode

python livekit.py console

Perfect for development and testing. Speaks directly through your computer's speakers.

Development Mode

python livekit.py dev

Connects to LiveKit server. Access via:

Example Conversation

Wake Word Mode:

User: "hey barista"
Barista: "Hey there! Welcome to the blockchain conference coffee station! I'm your friendly robot barista. How can I help you today?"
User: "What time is it?"
Barista: "The current time is 2:30 PM. Perfect time for an afternoon coffee! Would you like me to recommend something?"

Always-On Mode:

Barista: "Hello! I'm your robot barista here at the blockchain conference! Ready to help with coffee orders, questions about the event, or just chat!"
User: "What's today's date?"
Barista: "Today's date is January 15, 2025. Great day for the conference! Can I get you something to drink?"

βš™οΈ Configuration

Environment Variables

Variable Required Description
OPENAI_API_KEY βœ… OpenAI API key for Realtime API
PORCUPINE_ACCESS_KEY ❌ Enable "hey barista" wake word detection
LIVEKIT_API_KEY ❌ LiveKit Cloud API key (recommended)
LIVEKIT_API_SECRET ❌ LiveKit Cloud API secret
VOICE_AGENT_TEMPERATURE ❌ AI creativity level (0.0-1.0, default: 0.7)
VOICE_AGENT_VOICE ❌ OpenAI voice (default: nova)

Customizing Wake Words

The agent currently uses "hey barista" as the wake word. To customize, edit the start_wake_word_detection() method:

self.porcupine = pvporcupine.create(
    access_key=self.porcupine_access_key,
    keywords=["hey barista", "coffee bot", "hey coffee"],  # Add more keywords
    # Or use custom wake word files:
    # keyword_paths=["./wake_words/custom_wake_word.ppn"]
)

Note: The agent will automatically pause wake word detection during active conversations to prevent duplicate activations.

Adding Custom Functions

The agent supports function tools that can be called during conversation:

@function_tool
async def get_weather(self, location: str) -> str:
    """Get weather information for a location."""
    # Your weather API integration here
    return f"The weather in {location} is sunny and 72Β°F"

Customizing the AI Model

The agent uses OpenAI's Realtime API with the MultimodalAgent architecture:

# In the agent configuration
model = openai.realtime.RealtimeModel(
    model="gpt-4o-realtime-preview",
    voice=os.getenv("VOICE_AGENT_VOICE", "nova"),
    temperature=float(os.getenv("VOICE_AGENT_TEMPERATURE", "0.7")),
    instructions="""You are a helpful robot barista at a blockchain conference..."""
)

agent = MultimodalAgent(model=model)

Available Voices: alloy, echo, fable, onyx, nova, shimmer Temperature Range: 0.0 (deterministic) to 1.0 (creative)

πŸ”§ Advanced Features

Thread-Safe Wake Word Management

The agent implements sophisticated thread safety:

# Safe state transitions between threads
asyncio.run_coroutine_threadsafe(
    self.activate_conversation(room), 
    self.event_loop
)

# Automatic wake word pausing during conversation
self.wake_word_paused = True  # Prevents duplicate activations

Smart Timer Management

Intelligent conversation timeout with user engagement detection:

# Single timer tracking prevents multiple concurrent timers
if self.timeout_timer:
    self.timeout_timer.cancel()

# Restarts after agent speaks, cancelled when user speaks
self.timeout_timer = asyncio.create_task(self.conversation_timeout())

Event-Driven Architecture

Direct session management with real-time event handling:

@session.on("user_speech_committed")
async def on_user_speech_committed(self, message: rtc.ChatMessage):
    # Process user speech and manage conversation flow
    pass

πŸ“ž Integration Options

Web & Mobile Apps

Use LiveKit's client SDKs:

  • JavaScript/React: livekit-client
  • iOS/macOS: client-sdk-swift
  • Android: client-sdk-android
  • Flutter: client-sdk-flutter

Telephony

Connect to phone systems using LiveKit SIP:

# Your agent can receive phone calls!

WebRTC

Direct browser integration without downloads or apps.

πŸš€ Deployment

Development

python livekit.py dev

Production

python livekit.py start

Docker Deployment

FROM python:3.9-slim
COPY . /app
WORKDIR /app
RUN pip install -r requirements.txt
CMD ["python", "livekit.py", "start"]

Kubernetes

LiveKit agents support horizontal scaling and load balancing.

πŸ› οΈ Troubleshooting

Common Issues

Wake Word Not Working:

  • Check PORCUPINE_ACCESS_KEY is set correctly
  • Ensure microphone permissions are granted
  • Try speaking clearly: "hey barista"
  • Check that wake word detection isn't paused during conversation

No Audio Output:

  • Check system audio settings
  • Verify microphone/speaker permissions
  • Test with python livekit.py console first

Connection Issues:

  • Verify LiveKit credentials if using cloud features
  • Check network connectivity
  • Try local mode first: python livekit.py console

Multiple Wake Word Activations:

  • Agent automatically prevents duplicate activations
  • Wake word detection pauses during active conversation
  • If issues persist, check for race conditions in logs

Timer Issues:

  • Agent uses single timer tracking to prevent conflicts
  • Timer cancels when user speaks, restarts after agent responds
  • Check logs for timer management debug information

Thread Safety Issues:

  • All state changes use asyncio.run_coroutine_threadsafe()
  • Wake word detection runs in separate thread
  • If experiencing crashes, check for proper event loop handling

Debug Mode

logging.basicConfig(level=logging.DEBUG)

Testing Components

# Test microphone
python -c "import sounddevice as sd; print(sd.query_devices())"

# Test wake word detection
python -c "import pvporcupine; print('Porcupine available')"

# Test OpenAI connection
python -c "from openai import OpenAI; print('OpenAI available')"

# Test thread safety
python -c "import asyncio; print('Event loop support available')"

🎨 Customization Examples

Coffee Shop Personality

The current implementation uses a coffee barista theme:

instructions = """You are a helpful robot barista at a blockchain conference. 
You're enthusiastic about coffee and technology. Keep responses conversational 
and offer coffee recommendations when appropriate."""

Wake Word Customization

# Multiple wake words for coffee context
self.porcupine = pvporcupine.create(
    access_key=self.porcupine_access_key,
    keywords=["hey barista", "coffee bot", "hey coffee"]
)

Custom Function Tools

@function_tool
async def get_coffee_menu(self) -> str:
    """Get the current coffee menu."""
    return "Today's specials: Blockchain Blend, Crypto Cappuccino, DeFi Decaf"

@function_tool  
async def place_order(self, drink: str, size: str) -> str:
    """Place a coffee order."""
    return f"Perfect! I've noted your order for a {size} {drink}. It'll be ready shortly!"

πŸ”„ Recent Improvements & Fixes

Version History

Latest Update: Production-Ready Voice Agent

  • βœ… Timer Race Conditions Fixed: Implemented single timer tracking with proper cancellation
  • βœ… Thread Safety Resolved: Added asyncio.run_coroutine_threadsafe() for safe state transitions
  • βœ… Multiple Wake Word Protection: Prevents duplicate activations during conversation
  • βœ… Smart Conversation Management: Wake word detection pauses during active conversation
  • βœ… Event-Driven Architecture: Direct session management with real-time event handling
  • βœ… Coffee Barista Theme: Specialized for blockchain conference coffee station

Key Technical Improvements:

  • Upgraded from basic Agent to MultimodalAgent architecture
  • Implemented sophisticated timer management system
  • Added comprehensive thread safety measures
  • Built intelligent conversation state tracking
  • Enhanced wake word detection with pause/resume capability

Performance Characteristics:

  • ⚑ Ultra-Low Latency: ~200ms with OpenAI Realtime API
  • πŸ›‘οΈ Thread-Safe: Robust multi-threaded wake word detection
  • 🎯 Smart Activation: Intelligent wake word activation without duplicates
  • ⏱️ Automatic Timeout: Graceful conversation ending with timer management
  • πŸ”„ State Management: Clean conversation lifecycle with proper cleanup

πŸ“ˆ Performance Optimization

  • OpenAI Realtime API: Lowest latency (~200ms) with built-in turn detection
  • MultimodalAgent: Production-grade architecture with advanced session management
  • Thread Safety: Prevents race conditions and crashes in multi-threaded environment
  • Smart Timers: Single timer tracking eliminates multiple concurrent timer conflicts

πŸ“ License & Credits

  • LiveKit: Apache 2.0 License
  • OpenAI: Commercial API usage
  • Porcupine: Free tier available
  • Framework: Open source and extensible

Built with ❀️ using the LiveKit Agents Framework.