Update coffee_voice_agent README

kPatch · kPatch · commit 002494eaa915 · 2025-07-23T11:31:40.000-04:00
diff --git a/coffee_ws/src/coffee_voice_agent/README.md b/coffee_ws/src/coffee_voice_agent/README.md
@@ -1,168 +1,282 @@
 # Coffee Voice Agent ROS2 Package
 
-A ROS2 wrapper for the Coffee Barista Voice Agent that integrates LiveKit voice communication with the Coffee Buddy robot system.
+A ROS2 package that integrates the LiveKit Coffee Barista Voice Agent with the Coffee Buddy robot system through a clean bridge architecture.
 
 ## Overview
 
-This package wraps the original LiveKit voice agent (`livekit_voice_agent.py`) with a ROS2 node (`voice_agent_node.py`) to provide seamless integration with the Coffee Buddy robot ecosystem.
+This package provides ROS2 integration for the Coffee Barista Voice Agent while preserving its interactive console mode functionality. The voice agent runs as a standalone application with full console controls, while a separate ROS2 bridge node provides system integration via WebSocket communication.
 
 ## Features
 
-- **🎙️ Wake Word Detection**: "Hey barista" activation
-- **🗣️ Voice Conversation**: STT, LLM, and TTS using LiveKit/OpenAI
+- **🎙️ Wake Word Detection**: "Hey barista" activation with Porcupine
+- **🗣️ Voice Conversation**: STT, LLM, and TTS using LiveKit/OpenAI  
 - **😊 Emotion Processing**: Emotion-aware responses with animated expressions
-- **☕ Coffee Functions**: Menu, recommendations, and coffee commands
-- **🤖 ROS2 Integration**: Publishes/subscribes to Coffee Buddy topics
-- **🔗 Virtual Requests**: External coffee requests via ROS2
+- **☕ Coffee Functions**: Menu, recommendations, and ordering guidance
+- **🖥️ Console Mode**: Full interactive controls (Ctrl+B, Q) in terminal
+- **🌐 ROS2 Bridge**: WebSocket-based integration with Coffee Buddy system
+- **📡 Virtual Requests**: External coffee requests via ROS2 topics
 
 ## Architecture
 
 ```
 coffee_voice_agent/
-├── livekit_voice_agent.py      # Original voice agent (933 lines)
-├── voice_agent_node.py         # ROS2 wrapper + integration
+├── scripts/
+│   ├── livekit_voice_agent.py         # Original LiveKit voice agent
+│   └── run_voice_agent.sh             # Smart bash launcher
+├── coffee_voice_agent/
+│   └── voice_agent_bridge.py          # ROS2 bridge node
 └── launch/
-    └── voice_agent.launch.py   # Launch file
+    ├── voice_agent_bridge.launch.py   # Bridge only
+    └── voice_agent_system.launch.py   # Voice agent + bridge together
+```
+
+### Communication Flow
+```
+┌─────────────────┐    WebSocket    ┌─────────────────┐    ROS2     ┌──────────────┐
+│  Voice Agent    │◄───────────────►│  ROS2 Bridge    │◄───────────►│ Coffee Buddy │
+│  (Console Mode) │    Port 8080    │  (Bridge Node)  │   Topics    │   System     │
+│  Interactive    │                 │  Integration    │             │              │
+└─────────────────┘                 └─────────────────┘             └──────────────┘
 ```
 
 ## Dependencies
 
 ### Environment Variables
 ```bash
-export OPENAI_API_KEY="your_openai_api_key"
-export PORCUPINE_ACCESS_KEY="your_porcupine_key"  # Optional for wake word
-export VOICE_AGENT_VOICE="nova"                   # Optional TTS voice
-export VOICE_AGENT_TEMPERATURE="0.7"              # Optional LLM temperature
+export OPENAI_API_KEY="your_openai_api_key"           # Required
+export PORCUPINE_ACCESS_KEY="your_porcupine_key"      # Optional for wake word
+export VOICE_AGENT_VOICE="nova"                       # Optional TTS voice  
+export VOICE_AGENT_TEMPERATURE="0.7"                  # Optional LLM temperature
+export WEBSOCKET_HOST="localhost"                     # Optional WebSocket host
+export WEBSOCKET_PORT="8080"                          # Optional WebSocket port
 ```
 
 ### ROS2 Dependencies
 - `rclpy`
-- `std_msgs`
-- `coffee_machine_control_msgs`
-- `coffee_expressions_msgs`
+- `std_msgs` 
+- `geometry_msgs`
+- `websockets` (Python package)
 
-### Python Dependencies
+### Python Dependencies (in setup.py)
 - `livekit`
 - `livekit-agents[openai,deepgram,silero,turn-detector]`
 - `pvporcupine==3.0.5`
 - `pvrecorder==1.2.7`
 - `python-dotenv`
+- `websockets`
 
-## Usage
+## Installation & Usage
 
-### Build the Package
+### 1. Build the Package
 ```bash
 cd coffee_ws
 colcon build --packages-select coffee_voice_agent
 source install/setup.bash
 ```
 
-### Run the Voice Agent
+### 2. Run Voice Agent (Console Mode)
+
+**Primary Method - Direct Execution:**
 ```bash
-# Using launch file (recommended)
-ros2 launch coffee_voice_agent voice_agent.launch.py
+# Run directly for full console mode with interactive controls
+./src/coffee_voice_agent/scripts/run_voice_agent.sh
 
-# Or run directly
-ros2 run coffee_voice_agent voice_agent_node
+# Or after building:
+./install/coffee_voice_agent/share/coffee_voice_agent/scripts/run_voice_agent.sh
 ```
 
-## ROS2 Topics
+**Console Controls:**
+- `[Ctrl+B]` - Toggle between Text/Audio mode
+- `[Q]` - Quit the application
+- Wake word: Say **"hey barista"** to activate
 
-### Publishers
-- `/coffee_voice_agent/state` (String) - Agent state (dormant, active, speaking, etc.)
-- `/coffee_voice_agent/emotion` (String) - Current emotion (excited, friendly, curious, etc.)
-- `/coffee_voice_agent/user_input` (String) - User speech input
-- `/coffee_voice_agent/agent_response` (String) - Agent responses
-- `/coffee_voice_agent/wake_word_detected` (Bool) - Wake word detection events
+### 3. ROS2 Integration (Optional)
 
-### Subscribers
-- `/coffee_voice_agent/virtual_request` (String) - External coffee requests (JSON format)
+**Bridge Only** (if voice agent running separately):
+```bash
+ros2 launch coffee_voice_agent voice_agent_bridge.launch.py
+```
 
-### Integration Topics
-- Publishes to `/robot/affective_state` for expression system
-- Can subscribe to `/coffee_machine/get_status` for machine updates
+**Complete System** (voice agent + bridge together):
+```bash  
+ros2 launch coffee_voice_agent voice_agent_system.launch.py
+```
+
+## ROS2 Topics (Bridge Node)
+
+### Publishers (Voice Agent → ROS2)
+- `/voice_agent/state` (String) - Agent state changes (JSON)
+- `/voice_agent/conversation` (String) - Conversation transcripts (JSON)
+- `/voice_agent/emotion` (String) - Emotion changes (JSON)
+- `/voice_agent/connected` (Bool) - Bridge connection status
+
+### Subscribers (ROS2 → Voice Agent)
+- `/voice_agent/virtual_requests` (String) - External coffee requests (JSON)
+- `/voice_agent/commands` (String) - Voice agent commands (JSON)
 
 ## Virtual Requests
 
-Send coffee requests via ROS2:
+Send coffee requests to the voice agent via ROS2:
 
 ```bash
-ros2 topic pub /coffee_voice_agent/virtual_request std_msgs/String '{
+# New coffee order
+ros2 topic pub /voice_agent/virtual_requests std_msgs/String '{
   "data": "{\"request_type\": \"NEW_COFFEE_REQUEST\", \"content\": \"Espresso\", \"priority\": \"normal\"}"
 }'
+
+# Order ready notification
+ros2 topic pub /voice_agent/virtual_requests std_msgs/String '{
+  "data": "{\"request_type\": \"ORDER_READY\", \"content\": \"Americano (Order: abc123)\", \"priority\": \"urgent\"}"
+}'
 ```
 
 ## Monitoring
 
-Monitor voice agent activity:
+### Voice Agent Status
+```bash
+# Check if voice agent is running and WebSocket server is active
+curl -I http://localhost:8080
 
+# Monitor voice agent logs directly in console
+# (Console mode shows all logs in real-time)
+```
+
+### ROS2 Bridge Status  
 ```bash
-# Watch state changes
-ros2 topic echo /coffee_voice_agent/state
+# Check bridge connection
+ros2 topic echo /voice_agent/connected
 
-# Watch emotions
-ros2 topic echo /coffee_voice_agent/emotion
+# Monitor state changes
+ros2 topic echo /voice_agent/state
 
-# Watch user input
-ros2 topic echo /coffee_voice_agent/user_input
+# Monitor conversations
+ros2 topic echo /voice_agent/conversation
 
-# Watch agent responses  
-ros2 topic echo /coffee_voice_agent/agent_response
+# Monitor emotions
+ros2 topic echo /voice_agent/emotion
 ```
 
-## Integration with Coffee Buddy
+## Configuration
 
-The voice agent integrates with existing Coffee Buddy systems:
+### Launch File Parameters
+```bash
+# Bridge configuration
+ros2 launch coffee_voice_agent voice_agent_bridge.launch.py \
+    voice_agent_host:=192.168.1.100 \
+    voice_agent_port:=8080 \
+    reconnect_interval:=5.0
+
+# System configuration  
+ros2 launch coffee_voice_agent voice_agent_system.launch.py \
+    voice_agent_port:=8080 \
+    bridge_reconnect_interval:=3.0
+```
 
-1. **Expression System**: Publishes emotions to `/robot/affective_state`
-2. **Coffee Control**: Can send coffee commands (future enhancement)
-3. **Behavior System**: Receives and processes virtual requests
-4. **Head Control**: Emotion changes trigger head movements
+## Smart Mode Detection
 
-## Troubleshooting
+The bash launcher automatically detects the execution environment:
 
-### Missing API Keys
+- **Interactive Terminal**: Uses console mode with full controls
+- **Non-Interactive** (ROS2 launch): Automatically switches to start mode
+- **Environment Setup**: Loads `.env` files and validates prerequisites
+
+## Integration with Coffee Buddy
+
+### 1. Expression System
+```bash
+# Bridge can forward emotion changes to expression system
+ros2 topic echo /voice_agent/emotion
 ```
-[ERROR] Missing required environment variables: ['OPENAI_API_KEY']
+
+### 2. Coffee Machine Integration  
+```bash
+# Send machine status updates to voice agent
+ros2 topic pub /voice_agent/virtual_requests std_msgs/String '{
+  "data": "{\"request_type\": \"ORDER_PROCESSING\", \"content\": \"Your Espresso is brewing!\"}"
+}'
 ```
-**Solution**: Set your OpenAI API key in environment or `.env` file
 
-### Wake Word Not Working
+### 3. System Orchestration
+```python
+# Include in larger system launch files
+IncludeLaunchDescription(
+    PythonLaunchDescriptionSource([
+        FindPackageShare('coffee_voice_agent'),
+        '/launch/voice_agent_system.launch.py'
+    ])
+)
 ```
+
+## Troubleshooting
+
+### Voice Agent Issues
+```bash
+# Missing API keys
+[ERROR] Missing required environment variables: ['OPENAI_API_KEY']
+# Solution: Set OPENAI_API_KEY in environment or .env file
+
+# Wake word not working  
 [INFO] Wake Word Detection: ❌ Disabled (always-on mode)
+# Solution: Set PORCUPINE_ACCESS_KEY environment variable
+
+# Console mode not working via ROS2 launch
+[WARNING] Not running in an interactive terminal
+# Solution: Use direct script execution for console mode
 ```
-**Solution**: Set `PORCUPINE_ACCESS_KEY` environment variable
 
-### Build Errors
+### Bridge Connection Issues
+```bash
+# Bridge cannot connect to voice agent
+[ERROR] WebSocket connection error: [Errno 111] Connection refused
+# Solution: Ensure voice agent is running and WebSocket server is active
+
+# Check WebSocket server status
+curl -I http://localhost:8080
 ```
-AttributeError: module 'em' has no attribute 'BUFFERED_OPT'
+
+### Build Issues
+```bash
+# empy version conflict
+AttributeError: module 'em' has no attribute 'BUFFERED_OPT'  
+# Solution: pip install empy==3.3.4
 ```
-**Solution**: Install correct empy version: `pip install empy==3.3.4`
 
 ## Development
 
+### Package Structure
+- **Voice Agent**: Standalone CLI application in `scripts/`
+- **Bridge Node**: ROS2 integration in `coffee_voice_agent/`
+- **Launch Files**: System orchestration in `launch/`
+
 ### Adding New Features
-1. Modify `livekit_voice_agent.py` for voice agent functionality
-2. Modify `voice_agent_node.py` for ROS2 integration
-3. Add new topics/services as needed
+1. **Voice functionality**: Modify `livekit_voice_agent.py`
+2. **ROS2 integration**: Modify `voice_agent_bridge.py` 
+3. **System integration**: Update launch files
 
-### Testing
+### Testing Components
 ```bash
-# Test basic ROS2 functionality
-ros2 node info /coffee_voice_agent
+# Test voice agent directly
+./scripts/run_voice_agent.sh
 
-# Test topic publishing
-ros2 topic hz /coffee_voice_agent/state
+# Test bridge connection
+ros2 run coffee_voice_agent voice_agent_bridge
 
-# Test virtual requests
-ros2 topic pub /coffee_voice_agent/virtual_request std_msgs/String '{"data": "test"}'
+# Test complete system
+ros2 launch coffee_voice_agent voice_agent_system.launch.py
 ```
 
+## Design Principles
+
+- **Separation of Concerns**: Voice processing vs. system integration
+- **Console Mode Priority**: Interactive functionality preserved
+- **Clean Architecture**: WebSocket bridge avoids threading conflicts
+- **Flexibility**: Can run components separately or together
+- **ROS2 Native**: Bridge follows ROS2 patterns and conventions
+
 ## Future Enhancements
 
-- [ ] Add coffee machine control integration
-- [ ] Add behavior tree integration  
-- [ ] Add face recognition integration
-- [ ] Add gesture control
-- [ ] Add configuration via ROS2 parameters
-- [ ] Add diagnostics and monitoring 
+- [ ] Add service interfaces for synchronous voice agent control
+- [ ] Add parameter server integration for dynamic configuration
+- [ ] Add diagnostics and health monitoring
+- [ ] Add audio stream bridging for ROS2 audio topics
+- [ ] Add behavior tree integration for complex interaction flows