|
1 | 1 | # Coffee Voice Agent ROS2 Package |
2 | 2 |
|
3 | | -A ROS2 wrapper for the Coffee Barista Voice Agent that integrates LiveKit voice communication with the Coffee Buddy robot system. |
| 3 | +A ROS2 package that integrates the LiveKit Coffee Barista Voice Agent with the Coffee Buddy robot system through a clean bridge architecture. |
4 | 4 |
|
5 | 5 | ## Overview |
6 | 6 |
|
7 | | -This package wraps the original LiveKit voice agent (`livekit_voice_agent.py`) with a ROS2 node (`voice_agent_node.py`) to provide seamless integration with the Coffee Buddy robot ecosystem. |
| 7 | +This package provides ROS2 integration for the Coffee Barista Voice Agent while preserving its interactive console mode functionality. The voice agent runs as a standalone application with full console controls, while a separate ROS2 bridge node provides system integration via WebSocket communication. |
8 | 8 |
|
9 | 9 | ## Features |
10 | 10 |
|
11 | | -- **🎙️ Wake Word Detection**: "Hey barista" activation |
12 | | -- **🗣️ Voice Conversation**: STT, LLM, and TTS using LiveKit/OpenAI |
| 11 | +- **🎙️ Wake Word Detection**: "Hey barista" activation with Porcupine |
| 12 | +- **🗣️ Voice Conversation**: STT, LLM, and TTS using LiveKit/OpenAI |
13 | 13 | - **😊 Emotion Processing**: Emotion-aware responses with animated expressions |
14 | | -- **☕ Coffee Functions**: Menu, recommendations, and coffee commands |
15 | | -- **🤖 ROS2 Integration**: Publishes/subscribes to Coffee Buddy topics |
16 | | -- **🔗 Virtual Requests**: External coffee requests via ROS2 |
| 14 | +- **☕ Coffee Functions**: Menu, recommendations, and ordering guidance |
| 15 | +- **🖥️ Console Mode**: Full interactive controls (Ctrl+B, Q) in terminal |
| 16 | +- **🌐 ROS2 Bridge**: WebSocket-based integration with Coffee Buddy system |
| 17 | +- **📡 Virtual Requests**: External coffee requests via ROS2 topics |
17 | 18 |
|
18 | 19 | ## Architecture |
19 | 20 |
|
20 | 21 | ``` |
21 | 22 | coffee_voice_agent/ |
22 | | -├── livekit_voice_agent.py # Original voice agent (933 lines) |
23 | | -├── voice_agent_node.py # ROS2 wrapper + integration |
| 23 | +├── scripts/ |
| 24 | +│ ├── livekit_voice_agent.py # Original LiveKit voice agent |
| 25 | +│ └── run_voice_agent.sh # Smart bash launcher |
| 26 | +├── coffee_voice_agent/ |
| 27 | +│ └── voice_agent_bridge.py # ROS2 bridge node |
24 | 28 | └── launch/ |
25 | | - └── voice_agent.launch.py # Launch file |
| 29 | + ├── voice_agent_bridge.launch.py # Bridge only |
| 30 | + └── voice_agent_system.launch.py # Voice agent + bridge together |
| 31 | +``` |
| 32 | + |
| 33 | +### Communication Flow |
| 34 | +``` |
| 35 | +┌─────────────────┐ WebSocket ┌─────────────────┐ ROS2 ┌──────────────┐ |
| 36 | +│ Voice Agent │◄───────────────►│ ROS2 Bridge │◄───────────►│ Coffee Buddy │ |
| 37 | +│ (Console Mode) │ Port 8080 │ (Bridge Node) │ Topics │ System │ |
| 38 | +│ Interactive │ │ Integration │ │ │ |
| 39 | +└─────────────────┘ └─────────────────┘ └──────────────┘ |
26 | 40 | ``` |
27 | 41 |
|
28 | 42 | ## Dependencies |
29 | 43 |
|
30 | 44 | ### Environment Variables |
31 | 45 | ```bash |
32 | | -export OPENAI_API_KEY="your_openai_api_key" |
33 | | -export PORCUPINE_ACCESS_KEY="your_porcupine_key" # Optional for wake word |
34 | | -export VOICE_AGENT_VOICE="nova" # Optional TTS voice |
35 | | -export VOICE_AGENT_TEMPERATURE="0.7" # Optional LLM temperature |
| 46 | +export OPENAI_API_KEY="your_openai_api_key" # Required |
| 47 | +export PORCUPINE_ACCESS_KEY="your_porcupine_key" # Optional for wake word |
| 48 | +export VOICE_AGENT_VOICE="nova" # Optional TTS voice |
| 49 | +export VOICE_AGENT_TEMPERATURE="0.7" # Optional LLM temperature |
| 50 | +export WEBSOCKET_HOST="localhost" # Optional WebSocket host |
| 51 | +export WEBSOCKET_PORT="8080" # Optional WebSocket port |
36 | 52 | ``` |
37 | 53 |
|
38 | 54 | ### ROS2 Dependencies |
39 | 55 | - `rclpy` |
40 | | -- `std_msgs` |
41 | | -- `coffee_machine_control_msgs` |
42 | | -- `coffee_expressions_msgs` |
| 56 | +- `std_msgs` |
| 57 | +- `geometry_msgs` |
| 58 | +- `websockets` (Python package) |
43 | 59 |
|
44 | | -### Python Dependencies |
| 60 | +### Python Dependencies (in setup.py) |
45 | 61 | - `livekit` |
46 | 62 | - `livekit-agents[openai,deepgram,silero,turn-detector]` |
47 | 63 | - `pvporcupine==3.0.5` |
48 | 64 | - `pvrecorder==1.2.7` |
49 | 65 | - `python-dotenv` |
| 66 | +- `websockets` |
50 | 67 |
|
51 | | -## Usage |
| 68 | +## Installation & Usage |
52 | 69 |
|
53 | | -### Build the Package |
| 70 | +### 1. Build the Package |
54 | 71 | ```bash |
55 | 72 | cd coffee_ws |
56 | 73 | colcon build --packages-select coffee_voice_agent |
57 | 74 | source install/setup.bash |
58 | 75 | ``` |
59 | 76 |
|
60 | | -### Run the Voice Agent |
| 77 | +### 2. Run Voice Agent (Console Mode) |
| 78 | + |
| 79 | +**Primary Method - Direct Execution:** |
61 | 80 | ```bash |
62 | | -# Using launch file (recommended) |
63 | | -ros2 launch coffee_voice_agent voice_agent.launch.py |
| 81 | +# Run directly for full console mode with interactive controls |
| 82 | +./src/coffee_voice_agent/scripts/run_voice_agent.sh |
64 | 83 |
|
65 | | -# Or run directly |
66 | | -ros2 run coffee_voice_agent voice_agent_node |
| 84 | +# Or after building: |
| 85 | +./install/coffee_voice_agent/share/coffee_voice_agent/scripts/run_voice_agent.sh |
67 | 86 | ``` |
68 | 87 |
|
69 | | -## ROS2 Topics |
| 88 | +**Console Controls:** |
| 89 | +- `[Ctrl+B]` - Toggle between Text/Audio mode |
| 90 | +- `[Q]` - Quit the application |
| 91 | +- Wake word: Say **"hey barista"** to activate |
70 | 92 |
|
71 | | -### Publishers |
72 | | -- `/coffee_voice_agent/state` (String) - Agent state (dormant, active, speaking, etc.) |
73 | | -- `/coffee_voice_agent/emotion` (String) - Current emotion (excited, friendly, curious, etc.) |
74 | | -- `/coffee_voice_agent/user_input` (String) - User speech input |
75 | | -- `/coffee_voice_agent/agent_response` (String) - Agent responses |
76 | | -- `/coffee_voice_agent/wake_word_detected` (Bool) - Wake word detection events |
| 93 | +### 3. ROS2 Integration (Optional) |
77 | 94 |
|
78 | | -### Subscribers |
79 | | -- `/coffee_voice_agent/virtual_request` (String) - External coffee requests (JSON format) |
| 95 | +**Bridge Only** (if voice agent running separately): |
| 96 | +```bash |
| 97 | +ros2 launch coffee_voice_agent voice_agent_bridge.launch.py |
| 98 | +``` |
80 | 99 |
|
81 | | -### Integration Topics |
82 | | -- Publishes to `/robot/affective_state` for expression system |
83 | | -- Can subscribe to `/coffee_machine/get_status` for machine updates |
| 100 | +**Complete System** (voice agent + bridge together): |
| 101 | +```bash |
| 102 | +ros2 launch coffee_voice_agent voice_agent_system.launch.py |
| 103 | +``` |
| 104 | + |
| 105 | +## ROS2 Topics (Bridge Node) |
| 106 | + |
| 107 | +### Publishers (Voice Agent → ROS2) |
| 108 | +- `/voice_agent/state` (String) - Agent state changes (JSON) |
| 109 | +- `/voice_agent/conversation` (String) - Conversation transcripts (JSON) |
| 110 | +- `/voice_agent/emotion` (String) - Emotion changes (JSON) |
| 111 | +- `/voice_agent/connected` (Bool) - Bridge connection status |
| 112 | + |
| 113 | +### Subscribers (ROS2 → Voice Agent) |
| 114 | +- `/voice_agent/virtual_requests` (String) - External coffee requests (JSON) |
| 115 | +- `/voice_agent/commands` (String) - Voice agent commands (JSON) |
84 | 116 |
|
85 | 117 | ## Virtual Requests |
86 | 118 |
|
87 | | -Send coffee requests via ROS2: |
| 119 | +Send coffee requests to the voice agent via ROS2: |
88 | 120 |
|
89 | 121 | ```bash |
90 | | -ros2 topic pub /coffee_voice_agent/virtual_request std_msgs/String '{ |
| 122 | +# New coffee order |
| 123 | +ros2 topic pub /voice_agent/virtual_requests std_msgs/String '{ |
91 | 124 | "data": "{\"request_type\": \"NEW_COFFEE_REQUEST\", \"content\": \"Espresso\", \"priority\": \"normal\"}" |
92 | 125 | }' |
| 126 | + |
| 127 | +# Order ready notification |
| 128 | +ros2 topic pub /voice_agent/virtual_requests std_msgs/String '{ |
| 129 | + "data": "{\"request_type\": \"ORDER_READY\", \"content\": \"Americano (Order: abc123)\", \"priority\": \"urgent\"}" |
| 130 | +}' |
93 | 131 | ``` |
94 | 132 |
|
95 | 133 | ## Monitoring |
96 | 134 |
|
97 | | -Monitor voice agent activity: |
| 135 | +### Voice Agent Status |
| 136 | +```bash |
| 137 | +# Check if voice agent is running and WebSocket server is active |
| 138 | +curl -I http://localhost:8080 |
98 | 139 |
|
| 140 | +# Monitor voice agent logs directly in console |
| 141 | +# (Console mode shows all logs in real-time) |
| 142 | +``` |
| 143 | + |
| 144 | +### ROS2 Bridge Status |
99 | 145 | ```bash |
100 | | -# Watch state changes |
101 | | -ros2 topic echo /coffee_voice_agent/state |
| 146 | +# Check bridge connection |
| 147 | +ros2 topic echo /voice_agent/connected |
102 | 148 |
|
103 | | -# Watch emotions |
104 | | -ros2 topic echo /coffee_voice_agent/emotion |
| 149 | +# Monitor state changes |
| 150 | +ros2 topic echo /voice_agent/state |
105 | 151 |
|
106 | | -# Watch user input |
107 | | -ros2 topic echo /coffee_voice_agent/user_input |
| 152 | +# Monitor conversations |
| 153 | +ros2 topic echo /voice_agent/conversation |
108 | 154 |
|
109 | | -# Watch agent responses |
110 | | -ros2 topic echo /coffee_voice_agent/agent_response |
| 155 | +# Monitor emotions |
| 156 | +ros2 topic echo /voice_agent/emotion |
111 | 157 | ``` |
112 | 158 |
|
113 | | -## Integration with Coffee Buddy |
| 159 | +## Configuration |
114 | 160 |
|
115 | | -The voice agent integrates with existing Coffee Buddy systems: |
| 161 | +### Launch File Parameters |
| 162 | +```bash |
| 163 | +# Bridge configuration |
| 164 | +ros2 launch coffee_voice_agent voice_agent_bridge.launch.py \ |
| 165 | + voice_agent_host:=192.168.1.100 \ |
| 166 | + voice_agent_port:=8080 \ |
| 167 | + reconnect_interval:=5.0 |
| 168 | + |
| 169 | +# System configuration |
| 170 | +ros2 launch coffee_voice_agent voice_agent_system.launch.py \ |
| 171 | + voice_agent_port:=8080 \ |
| 172 | + bridge_reconnect_interval:=3.0 |
| 173 | +``` |
116 | 174 |
|
117 | | -1. **Expression System**: Publishes emotions to `/robot/affective_state` |
118 | | -2. **Coffee Control**: Can send coffee commands (future enhancement) |
119 | | -3. **Behavior System**: Receives and processes virtual requests |
120 | | -4. **Head Control**: Emotion changes trigger head movements |
| 175 | +## Smart Mode Detection |
121 | 176 |
|
122 | | -## Troubleshooting |
| 177 | +The bash launcher automatically detects the execution environment: |
123 | 178 |
|
124 | | -### Missing API Keys |
| 179 | +- **Interactive Terminal**: Uses console mode with full controls |
| 180 | +- **Non-Interactive** (ROS2 launch): Automatically switches to start mode |
| 181 | +- **Environment Setup**: Loads `.env` files and validates prerequisites |
| 182 | + |
| 183 | +## Integration with Coffee Buddy |
| 184 | + |
| 185 | +### 1. Expression System |
| 186 | +```bash |
| 187 | +# Bridge can forward emotion changes to expression system |
| 188 | +ros2 topic echo /voice_agent/emotion |
125 | 189 | ``` |
126 | | -[ERROR] Missing required environment variables: ['OPENAI_API_KEY'] |
| 190 | + |
| 191 | +### 2. Coffee Machine Integration |
| 192 | +```bash |
| 193 | +# Send machine status updates to voice agent |
| 194 | +ros2 topic pub /voice_agent/virtual_requests std_msgs/String '{ |
| 195 | + "data": "{\"request_type\": \"ORDER_PROCESSING\", \"content\": \"Your Espresso is brewing!\"}" |
| 196 | +}' |
127 | 197 | ``` |
128 | | -**Solution**: Set your OpenAI API key in environment or `.env` file |
129 | 198 |
|
130 | | -### Wake Word Not Working |
| 199 | +### 3. System Orchestration |
| 200 | +```python |
| 201 | +# Include in larger system launch files |
| 202 | +IncludeLaunchDescription( |
| 203 | + PythonLaunchDescriptionSource([ |
| 204 | + FindPackageShare('coffee_voice_agent'), |
| 205 | + '/launch/voice_agent_system.launch.py' |
| 206 | + ]) |
| 207 | +) |
131 | 208 | ``` |
| 209 | + |
| 210 | +## Troubleshooting |
| 211 | + |
| 212 | +### Voice Agent Issues |
| 213 | +```bash |
| 214 | +# Missing API keys |
| 215 | +[ERROR] Missing required environment variables: ['OPENAI_API_KEY'] |
| 216 | +# Solution: Set OPENAI_API_KEY in environment or .env file |
| 217 | + |
| 218 | +# Wake word not working |
132 | 219 | [INFO] Wake Word Detection: ❌ Disabled (always-on mode) |
| 220 | +# Solution: Set PORCUPINE_ACCESS_KEY environment variable |
| 221 | + |
| 222 | +# Console mode not working via ROS2 launch |
| 223 | +[WARNING] Not running in an interactive terminal |
| 224 | +# Solution: Use direct script execution for console mode |
133 | 225 | ``` |
134 | | -**Solution**: Set `PORCUPINE_ACCESS_KEY` environment variable |
135 | 226 |
|
136 | | -### Build Errors |
| 227 | +### Bridge Connection Issues |
| 228 | +```bash |
| 229 | +# Bridge cannot connect to voice agent |
| 230 | +[ERROR] WebSocket connection error: [Errno 111] Connection refused |
| 231 | +# Solution: Ensure voice agent is running and WebSocket server is active |
| 232 | + |
| 233 | +# Check WebSocket server status |
| 234 | +curl -I http://localhost:8080 |
137 | 235 | ``` |
138 | | -AttributeError: module 'em' has no attribute 'BUFFERED_OPT' |
| 236 | + |
| 237 | +### Build Issues |
| 238 | +```bash |
| 239 | +# empy version conflict |
| 240 | +AttributeError: module 'em' has no attribute 'BUFFERED_OPT' |
| 241 | +# Solution: pip install empy==3.3.4 |
139 | 242 | ``` |
140 | | -**Solution**: Install correct empy version: `pip install empy==3.3.4` |
141 | 243 |
|
142 | 244 | ## Development |
143 | 245 |
|
| 246 | +### Package Structure |
| 247 | +- **Voice Agent**: Standalone CLI application in `scripts/` |
| 248 | +- **Bridge Node**: ROS2 integration in `coffee_voice_agent/` |
| 249 | +- **Launch Files**: System orchestration in `launch/` |
| 250 | + |
144 | 251 | ### Adding New Features |
145 | | -1. Modify `livekit_voice_agent.py` for voice agent functionality |
146 | | -2. Modify `voice_agent_node.py` for ROS2 integration |
147 | | -3. Add new topics/services as needed |
| 252 | +1. **Voice functionality**: Modify `livekit_voice_agent.py` |
| 253 | +2. **ROS2 integration**: Modify `voice_agent_bridge.py` |
| 254 | +3. **System integration**: Update launch files |
148 | 255 |
|
149 | | -### Testing |
| 256 | +### Testing Components |
150 | 257 | ```bash |
151 | | -# Test basic ROS2 functionality |
152 | | -ros2 node info /coffee_voice_agent |
| 258 | +# Test voice agent directly |
| 259 | +./scripts/run_voice_agent.sh |
153 | 260 |
|
154 | | -# Test topic publishing |
155 | | -ros2 topic hz /coffee_voice_agent/state |
| 261 | +# Test bridge connection |
| 262 | +ros2 run coffee_voice_agent voice_agent_bridge |
156 | 263 |
|
157 | | -# Test virtual requests |
158 | | -ros2 topic pub /coffee_voice_agent/virtual_request std_msgs/String '{"data": "test"}' |
| 264 | +# Test complete system |
| 265 | +ros2 launch coffee_voice_agent voice_agent_system.launch.py |
159 | 266 | ``` |
160 | 267 |
|
| 268 | +## Design Principles |
| 269 | + |
| 270 | +- **Separation of Concerns**: Voice processing vs. system integration |
| 271 | +- **Console Mode Priority**: Interactive functionality preserved |
| 272 | +- **Clean Architecture**: WebSocket bridge avoids threading conflicts |
| 273 | +- **Flexibility**: Can run components separately or together |
| 274 | +- **ROS2 Native**: Bridge follows ROS2 patterns and conventions |
| 275 | + |
161 | 276 | ## Future Enhancements |
162 | 277 |
|
163 | | -- [ ] Add coffee machine control integration |
164 | | -- [ ] Add behavior tree integration |
165 | | -- [ ] Add face recognition integration |
166 | | -- [ ] Add gesture control |
167 | | -- [ ] Add configuration via ROS2 parameters |
168 | | -- [ ] Add diagnostics and monitoring |
| 278 | +- [ ] Add service interfaces for synchronous voice agent control |
| 279 | +- [ ] Add parameter server integration for dynamic configuration |
| 280 | +- [ ] Add diagnostics and health monitoring |
| 281 | +- [ ] Add audio stream bridging for ROS2 audio topics |
| 282 | +- [ ] Add behavior tree integration for complex interaction flows |
0 commit comments