This is a production-ready Python voice agent built using the LiveKit Agents Framework with OpenAI Realtime API integration. The agent acts as a helpful coffee barista robot at a blockchain conference, supporting wake word detection for hands-free interaction.
- ποΈ MultimodalAgent Architecture: Built with LiveKit's advanced MultimodalAgent framework
- π Smart Wake Word Detection: "Hey Barista" activation with intelligent conversation management
- π€ OpenAI Realtime API: Ultra-low latency voice-to-voice interaction (~200ms)
- β‘ Thread-Safe State Management: Robust multi-threaded wake word detection
- π Smart Timer Management: Automatic conversation timeout with user engagement detection
- π‘οΈ Duplicate Protection: Prevents multiple wake word activations during conversation
- β Coffee Barista Theme: Specialized for coffee ordering and blockchain conference context
- π οΈ Function Tools: Built-in time/date functions with easy extensibility
- π Multi-mode Support: Terminal, development, and production modes
- π΅ Audio Processing: Advanced VAD, turn detection, and conversation flow
pip install -r requirements.txtCreate a .env file in the project root:
# Required: OpenAI API Configuration
OPENAI_API_KEY=your_openai_api_key_here
# Optional: Wake Word Detection
PORCUPINE_ACCESS_KEY=your_porcupine_access_key_here
# Optional: LiveKit Cloud (recommended for production)
LIVEKIT_API_KEY=your_livekit_api_key_here
LIVEKIT_API_SECRET=your_livekit_api_secret_here
LIVEKIT_URL=wss://your-livekit-server.com
# Agent Configuration
VOICE_AGENT_TEMPERATURE=0.7 # AI response creativity (0.0-1.0)
VOICE_AGENT_VOICE=nova # OpenAI voice: alloy, echo, fable, onyx, nova, shimmer- Sign up at OpenAI Platform
- Create an API key
- Add to your
.envfile
- Sign up at Picovoice Console
- Create a project and get your Access Key
- Add to your
.envfile
- Sign up at LiveKit Cloud
- Create a project
- Copy API Key and Secret
python livekit.py download-files# Terminal mode (local testing)
python livekit.py console
# Development mode (connect to LiveKit)
python livekit.py dev
# Production mode
python livekit.py startThe coffee barista agent uses LiveKit's advanced MultimodalAgent framework with thread-safe wake word detection:
User Voice β LiveKit Room β MultimodalAgent Session β OpenAI Realtime API β Voice Response
β
Wake Word Detection Thread
Wake Word Mode (when PORCUPINE_ACCESS_KEY is set):
- Continuously monitors for "hey barista"
- Activates conversation on wake word detection
- Intelligent conversation state management
- Automatic wake word pausing during conversation
- Smart timer-based conversation timeout
Always-On Mode (when no wake word key):
- Immediately ready for conversation
- Greets user as coffee barista
- Higher engagement but more power usage
Thread-Safe Wake Word Detection:
- Uses
asyncio.run_coroutine_threadsafe()for thread safety - Prevents race conditions between wake word and main threads
- Duplicate activation protection
Smart Timer Management:
- Single timer tracking (prevents multiple concurrent timers)
- Automatic cancellation when user speaks
- Intelligent restart after agent responses
- Proper cleanup on conversation end
Conversation State Management:
- Pauses wake word detection during active conversation
- Prevents multiple simultaneous activations
- Graceful conversation ending with timer-based timeout
python livekit.py consolePerfect for development and testing. Speaks directly through your computer's speakers.
python livekit.py devConnects to LiveKit server. Access via:
- Agents Playground
- Custom web/mobile apps
- Phone calls (with SIP integration)
Wake Word Mode:
User: "hey barista"
Barista: "Hey there! Welcome to the blockchain conference coffee station! I'm your friendly robot barista. How can I help you today?"
User: "What time is it?"
Barista: "The current time is 2:30 PM. Perfect time for an afternoon coffee! Would you like me to recommend something?"
Always-On Mode:
Barista: "Hello! I'm your robot barista here at the blockchain conference! Ready to help with coffee orders, questions about the event, or just chat!"
User: "What's today's date?"
Barista: "Today's date is January 15, 2025. Great day for the conference! Can I get you something to drink?"
| Variable | Required | Description |
|---|---|---|
OPENAI_API_KEY |
β | OpenAI API key for Realtime API |
PORCUPINE_ACCESS_KEY |
β | Enable "hey barista" wake word detection |
LIVEKIT_API_KEY |
β | LiveKit Cloud API key (recommended) |
LIVEKIT_API_SECRET |
β | LiveKit Cloud API secret |
VOICE_AGENT_TEMPERATURE |
β | AI creativity level (0.0-1.0, default: 0.7) |
VOICE_AGENT_VOICE |
β | OpenAI voice (default: nova) |
The agent currently uses "hey barista" as the wake word. To customize, edit the start_wake_word_detection() method:
self.porcupine = pvporcupine.create(
access_key=self.porcupine_access_key,
keywords=["hey barista", "coffee bot", "hey coffee"], # Add more keywords
# Or use custom wake word files:
# keyword_paths=["./wake_words/custom_wake_word.ppn"]
)Note: The agent will automatically pause wake word detection during active conversations to prevent duplicate activations.
The agent supports function tools that can be called during conversation:
@function_tool
async def get_weather(self, location: str) -> str:
"""Get weather information for a location."""
# Your weather API integration here
return f"The weather in {location} is sunny and 72Β°F"The agent uses OpenAI's Realtime API with the MultimodalAgent architecture:
# In the agent configuration
model = openai.realtime.RealtimeModel(
model="gpt-4o-realtime-preview",
voice=os.getenv("VOICE_AGENT_VOICE", "nova"),
temperature=float(os.getenv("VOICE_AGENT_TEMPERATURE", "0.7")),
instructions="""You are a helpful robot barista at a blockchain conference..."""
)
agent = MultimodalAgent(model=model)Available Voices: alloy, echo, fable, onyx, nova, shimmer Temperature Range: 0.0 (deterministic) to 1.0 (creative)
The agent implements sophisticated thread safety:
# Safe state transitions between threads
asyncio.run_coroutine_threadsafe(
self.activate_conversation(room),
self.event_loop
)
# Automatic wake word pausing during conversation
self.wake_word_paused = True # Prevents duplicate activationsIntelligent conversation timeout with user engagement detection:
# Single timer tracking prevents multiple concurrent timers
if self.timeout_timer:
self.timeout_timer.cancel()
# Restarts after agent speaks, cancelled when user speaks
self.timeout_timer = asyncio.create_task(self.conversation_timeout())Direct session management with real-time event handling:
@session.on("user_speech_committed")
async def on_user_speech_committed(self, message: rtc.ChatMessage):
# Process user speech and manage conversation flow
passUse LiveKit's client SDKs:
- JavaScript/React:
livekit-client - iOS/macOS:
client-sdk-swift - Android:
client-sdk-android - Flutter:
client-sdk-flutter
Connect to phone systems using LiveKit SIP:
# Your agent can receive phone calls!Direct browser integration without downloads or apps.
python livekit.py devpython livekit.py startFROM python:3.9-slim
COPY . /app
WORKDIR /app
RUN pip install -r requirements.txt
CMD ["python", "livekit.py", "start"]LiveKit agents support horizontal scaling and load balancing.
Wake Word Not Working:
- Check
PORCUPINE_ACCESS_KEYis set correctly - Ensure microphone permissions are granted
- Try speaking clearly: "hey barista"
- Check that wake word detection isn't paused during conversation
No Audio Output:
- Check system audio settings
- Verify microphone/speaker permissions
- Test with
python livekit.py consolefirst
Connection Issues:
- Verify LiveKit credentials if using cloud features
- Check network connectivity
- Try local mode first:
python livekit.py console
Multiple Wake Word Activations:
- Agent automatically prevents duplicate activations
- Wake word detection pauses during active conversation
- If issues persist, check for race conditions in logs
Timer Issues:
- Agent uses single timer tracking to prevent conflicts
- Timer cancels when user speaks, restarts after agent responds
- Check logs for timer management debug information
Thread Safety Issues:
- All state changes use
asyncio.run_coroutine_threadsafe() - Wake word detection runs in separate thread
- If experiencing crashes, check for proper event loop handling
logging.basicConfig(level=logging.DEBUG)# Test microphone
python -c "import sounddevice as sd; print(sd.query_devices())"
# Test wake word detection
python -c "import pvporcupine; print('Porcupine available')"
# Test OpenAI connection
python -c "from openai import OpenAI; print('OpenAI available')"
# Test thread safety
python -c "import asyncio; print('Event loop support available')"The current implementation uses a coffee barista theme:
instructions = """You are a helpful robot barista at a blockchain conference.
You're enthusiastic about coffee and technology. Keep responses conversational
and offer coffee recommendations when appropriate."""# Multiple wake words for coffee context
self.porcupine = pvporcupine.create(
access_key=self.porcupine_access_key,
keywords=["hey barista", "coffee bot", "hey coffee"]
)@function_tool
async def get_coffee_menu(self) -> str:
"""Get the current coffee menu."""
return "Today's specials: Blockchain Blend, Crypto Cappuccino, DeFi Decaf"
@function_tool
async def place_order(self, drink: str, size: str) -> str:
"""Place a coffee order."""
return f"Perfect! I've noted your order for a {size} {drink}. It'll be ready shortly!"Latest Update: Production-Ready Voice Agent
- β Timer Race Conditions Fixed: Implemented single timer tracking with proper cancellation
- β
Thread Safety Resolved: Added
asyncio.run_coroutine_threadsafe()for safe state transitions - β Multiple Wake Word Protection: Prevents duplicate activations during conversation
- β Smart Conversation Management: Wake word detection pauses during active conversation
- β Event-Driven Architecture: Direct session management with real-time event handling
- β Coffee Barista Theme: Specialized for blockchain conference coffee station
Key Technical Improvements:
- Upgraded from basic Agent to MultimodalAgent architecture
- Implemented sophisticated timer management system
- Added comprehensive thread safety measures
- Built intelligent conversation state tracking
- Enhanced wake word detection with pause/resume capability
Performance Characteristics:
- β‘ Ultra-Low Latency: ~200ms with OpenAI Realtime API
- π‘οΈ Thread-Safe: Robust multi-threaded wake word detection
- π― Smart Activation: Intelligent wake word activation without duplicates
- β±οΈ Automatic Timeout: Graceful conversation ending with timer management
- π State Management: Clean conversation lifecycle with proper cleanup
- OpenAI Realtime API: Lowest latency (~200ms) with built-in turn detection
- MultimodalAgent: Production-grade architecture with advanced session management
- Thread Safety: Prevents race conditions and crashes in multi-threaded environment
- Smart Timers: Single timer tracking eliminates multiple concurrent timer conflicts
- LiveKit: Apache 2.0 License
- OpenAI: Commercial API usage
- Porcupine: Free tier available
- Framework: Open source and extensible
Built with β€οΈ using the LiveKit Agents Framework.