Skip to content

Commit 002494e

Browse files
committed
Update coffee_voice_agent README
1 parent 1469810 commit 002494e

File tree

1 file changed

+193
-79
lines changed

1 file changed

+193
-79
lines changed
Lines changed: 193 additions & 79 deletions
Original file line numberDiff line numberDiff line change
@@ -1,168 +1,282 @@
11
# Coffee Voice Agent ROS2 Package
22

3-
A ROS2 wrapper for the Coffee Barista Voice Agent that integrates LiveKit voice communication with the Coffee Buddy robot system.
3+
A ROS2 package that integrates the LiveKit Coffee Barista Voice Agent with the Coffee Buddy robot system through a clean bridge architecture.
44

55
## Overview
66

7-
This package wraps the original LiveKit voice agent (`livekit_voice_agent.py`) with a ROS2 node (`voice_agent_node.py`) to provide seamless integration with the Coffee Buddy robot ecosystem.
7+
This package provides ROS2 integration for the Coffee Barista Voice Agent while preserving its interactive console mode functionality. The voice agent runs as a standalone application with full console controls, while a separate ROS2 bridge node provides system integration via WebSocket communication.
88

99
## Features
1010

11-
- **🎙️ Wake Word Detection**: "Hey barista" activation
12-
- **🗣️ Voice Conversation**: STT, LLM, and TTS using LiveKit/OpenAI
11+
- **🎙️ Wake Word Detection**: "Hey barista" activation with Porcupine
12+
- **🗣️ Voice Conversation**: STT, LLM, and TTS using LiveKit/OpenAI
1313
- **😊 Emotion Processing**: Emotion-aware responses with animated expressions
14-
- **☕ Coffee Functions**: Menu, recommendations, and coffee commands
15-
- **🤖 ROS2 Integration**: Publishes/subscribes to Coffee Buddy topics
16-
- **🔗 Virtual Requests**: External coffee requests via ROS2
14+
- **☕ Coffee Functions**: Menu, recommendations, and ordering guidance
15+
- **🖥️ Console Mode**: Full interactive controls (Ctrl+B, Q) in terminal
16+
- **🌐 ROS2 Bridge**: WebSocket-based integration with Coffee Buddy system
17+
- **📡 Virtual Requests**: External coffee requests via ROS2 topics
1718

1819
## Architecture
1920

2021
```
2122
coffee_voice_agent/
22-
├── livekit_voice_agent.py # Original voice agent (933 lines)
23-
├── voice_agent_node.py # ROS2 wrapper + integration
23+
├── scripts/
24+
│ ├── livekit_voice_agent.py # Original LiveKit voice agent
25+
│ └── run_voice_agent.sh # Smart bash launcher
26+
├── coffee_voice_agent/
27+
│ └── voice_agent_bridge.py # ROS2 bridge node
2428
└── launch/
25-
└── voice_agent.launch.py # Launch file
29+
├── voice_agent_bridge.launch.py # Bridge only
30+
└── voice_agent_system.launch.py # Voice agent + bridge together
31+
```
32+
33+
### Communication Flow
34+
```
35+
┌─────────────────┐ WebSocket ┌─────────────────┐ ROS2 ┌──────────────┐
36+
│ Voice Agent │◄───────────────►│ ROS2 Bridge │◄───────────►│ Coffee Buddy │
37+
│ (Console Mode) │ Port 8080 │ (Bridge Node) │ Topics │ System │
38+
│ Interactive │ │ Integration │ │ │
39+
└─────────────────┘ └─────────────────┘ └──────────────┘
2640
```
2741

2842
## Dependencies
2943

3044
### Environment Variables
3145
```bash
32-
export OPENAI_API_KEY="your_openai_api_key"
33-
export PORCUPINE_ACCESS_KEY="your_porcupine_key" # Optional for wake word
34-
export VOICE_AGENT_VOICE="nova" # Optional TTS voice
35-
export VOICE_AGENT_TEMPERATURE="0.7" # Optional LLM temperature
46+
export OPENAI_API_KEY="your_openai_api_key" # Required
47+
export PORCUPINE_ACCESS_KEY="your_porcupine_key" # Optional for wake word
48+
export VOICE_AGENT_VOICE="nova" # Optional TTS voice
49+
export VOICE_AGENT_TEMPERATURE="0.7" # Optional LLM temperature
50+
export WEBSOCKET_HOST="localhost" # Optional WebSocket host
51+
export WEBSOCKET_PORT="8080" # Optional WebSocket port
3652
```
3753

3854
### ROS2 Dependencies
3955
- `rclpy`
40-
- `std_msgs`
41-
- `coffee_machine_control_msgs`
42-
- `coffee_expressions_msgs`
56+
- `std_msgs`
57+
- `geometry_msgs`
58+
- `websockets` (Python package)
4359

44-
### Python Dependencies
60+
### Python Dependencies (in setup.py)
4561
- `livekit`
4662
- `livekit-agents[openai,deepgram,silero,turn-detector]`
4763
- `pvporcupine==3.0.5`
4864
- `pvrecorder==1.2.7`
4965
- `python-dotenv`
66+
- `websockets`
5067

51-
## Usage
68+
## Installation & Usage
5269

53-
### Build the Package
70+
### 1. Build the Package
5471
```bash
5572
cd coffee_ws
5673
colcon build --packages-select coffee_voice_agent
5774
source install/setup.bash
5875
```
5976

60-
### Run the Voice Agent
77+
### 2. Run Voice Agent (Console Mode)
78+
79+
**Primary Method - Direct Execution:**
6180
```bash
62-
# Using launch file (recommended)
63-
ros2 launch coffee_voice_agent voice_agent.launch.py
81+
# Run directly for full console mode with interactive controls
82+
./src/coffee_voice_agent/scripts/run_voice_agent.sh
6483

65-
# Or run directly
66-
ros2 run coffee_voice_agent voice_agent_node
84+
# Or after building:
85+
./install/coffee_voice_agent/share/coffee_voice_agent/scripts/run_voice_agent.sh
6786
```
6887

69-
## ROS2 Topics
88+
**Console Controls:**
89+
- `[Ctrl+B]` - Toggle between Text/Audio mode
90+
- `[Q]` - Quit the application
91+
- Wake word: Say **"hey barista"** to activate
7092

71-
### Publishers
72-
- `/coffee_voice_agent/state` (String) - Agent state (dormant, active, speaking, etc.)
73-
- `/coffee_voice_agent/emotion` (String) - Current emotion (excited, friendly, curious, etc.)
74-
- `/coffee_voice_agent/user_input` (String) - User speech input
75-
- `/coffee_voice_agent/agent_response` (String) - Agent responses
76-
- `/coffee_voice_agent/wake_word_detected` (Bool) - Wake word detection events
93+
### 3. ROS2 Integration (Optional)
7794

78-
### Subscribers
79-
- `/coffee_voice_agent/virtual_request` (String) - External coffee requests (JSON format)
95+
**Bridge Only** (if voice agent running separately):
96+
```bash
97+
ros2 launch coffee_voice_agent voice_agent_bridge.launch.py
98+
```
8099

81-
### Integration Topics
82-
- Publishes to `/robot/affective_state` for expression system
83-
- Can subscribe to `/coffee_machine/get_status` for machine updates
100+
**Complete System** (voice agent + bridge together):
101+
```bash
102+
ros2 launch coffee_voice_agent voice_agent_system.launch.py
103+
```
104+
105+
## ROS2 Topics (Bridge Node)
106+
107+
### Publishers (Voice Agent → ROS2)
108+
- `/voice_agent/state` (String) - Agent state changes (JSON)
109+
- `/voice_agent/conversation` (String) - Conversation transcripts (JSON)
110+
- `/voice_agent/emotion` (String) - Emotion changes (JSON)
111+
- `/voice_agent/connected` (Bool) - Bridge connection status
112+
113+
### Subscribers (ROS2 → Voice Agent)
114+
- `/voice_agent/virtual_requests` (String) - External coffee requests (JSON)
115+
- `/voice_agent/commands` (String) - Voice agent commands (JSON)
84116

85117
## Virtual Requests
86118

87-
Send coffee requests via ROS2:
119+
Send coffee requests to the voice agent via ROS2:
88120

89121
```bash
90-
ros2 topic pub /coffee_voice_agent/virtual_request std_msgs/String '{
122+
# New coffee order
123+
ros2 topic pub /voice_agent/virtual_requests std_msgs/String '{
91124
"data": "{\"request_type\": \"NEW_COFFEE_REQUEST\", \"content\": \"Espresso\", \"priority\": \"normal\"}"
92125
}'
126+
127+
# Order ready notification
128+
ros2 topic pub /voice_agent/virtual_requests std_msgs/String '{
129+
"data": "{\"request_type\": \"ORDER_READY\", \"content\": \"Americano (Order: abc123)\", \"priority\": \"urgent\"}"
130+
}'
93131
```
94132

95133
## Monitoring
96134

97-
Monitor voice agent activity:
135+
### Voice Agent Status
136+
```bash
137+
# Check if voice agent is running and WebSocket server is active
138+
curl -I http://localhost:8080
98139

140+
# Monitor voice agent logs directly in console
141+
# (Console mode shows all logs in real-time)
142+
```
143+
144+
### ROS2 Bridge Status
99145
```bash
100-
# Watch state changes
101-
ros2 topic echo /coffee_voice_agent/state
146+
# Check bridge connection
147+
ros2 topic echo /voice_agent/connected
102148

103-
# Watch emotions
104-
ros2 topic echo /coffee_voice_agent/emotion
149+
# Monitor state changes
150+
ros2 topic echo /voice_agent/state
105151

106-
# Watch user input
107-
ros2 topic echo /coffee_voice_agent/user_input
152+
# Monitor conversations
153+
ros2 topic echo /voice_agent/conversation
108154

109-
# Watch agent responses
110-
ros2 topic echo /coffee_voice_agent/agent_response
155+
# Monitor emotions
156+
ros2 topic echo /voice_agent/emotion
111157
```
112158

113-
## Integration with Coffee Buddy
159+
## Configuration
114160

115-
The voice agent integrates with existing Coffee Buddy systems:
161+
### Launch File Parameters
162+
```bash
163+
# Bridge configuration
164+
ros2 launch coffee_voice_agent voice_agent_bridge.launch.py \
165+
voice_agent_host:=192.168.1.100 \
166+
voice_agent_port:=8080 \
167+
reconnect_interval:=5.0
168+
169+
# System configuration
170+
ros2 launch coffee_voice_agent voice_agent_system.launch.py \
171+
voice_agent_port:=8080 \
172+
bridge_reconnect_interval:=3.0
173+
```
116174

117-
1. **Expression System**: Publishes emotions to `/robot/affective_state`
118-
2. **Coffee Control**: Can send coffee commands (future enhancement)
119-
3. **Behavior System**: Receives and processes virtual requests
120-
4. **Head Control**: Emotion changes trigger head movements
175+
## Smart Mode Detection
121176

122-
## Troubleshooting
177+
The bash launcher automatically detects the execution environment:
123178

124-
### Missing API Keys
179+
- **Interactive Terminal**: Uses console mode with full controls
180+
- **Non-Interactive** (ROS2 launch): Automatically switches to start mode
181+
- **Environment Setup**: Loads `.env` files and validates prerequisites
182+
183+
## Integration with Coffee Buddy
184+
185+
### 1. Expression System
186+
```bash
187+
# Bridge can forward emotion changes to expression system
188+
ros2 topic echo /voice_agent/emotion
125189
```
126-
[ERROR] Missing required environment variables: ['OPENAI_API_KEY']
190+
191+
### 2. Coffee Machine Integration
192+
```bash
193+
# Send machine status updates to voice agent
194+
ros2 topic pub /voice_agent/virtual_requests std_msgs/String '{
195+
"data": "{\"request_type\": \"ORDER_PROCESSING\", \"content\": \"Your Espresso is brewing!\"}"
196+
}'
127197
```
128-
**Solution**: Set your OpenAI API key in environment or `.env` file
129198

130-
### Wake Word Not Working
199+
### 3. System Orchestration
200+
```python
201+
# Include in larger system launch files
202+
IncludeLaunchDescription(
203+
PythonLaunchDescriptionSource([
204+
FindPackageShare('coffee_voice_agent'),
205+
'/launch/voice_agent_system.launch.py'
206+
])
207+
)
131208
```
209+
210+
## Troubleshooting
211+
212+
### Voice Agent Issues
213+
```bash
214+
# Missing API keys
215+
[ERROR] Missing required environment variables: ['OPENAI_API_KEY']
216+
# Solution: Set OPENAI_API_KEY in environment or .env file
217+
218+
# Wake word not working
132219
[INFO] Wake Word Detection: ❌ Disabled (always-on mode)
220+
# Solution: Set PORCUPINE_ACCESS_KEY environment variable
221+
222+
# Console mode not working via ROS2 launch
223+
[WARNING] Not running in an interactive terminal
224+
# Solution: Use direct script execution for console mode
133225
```
134-
**Solution**: Set `PORCUPINE_ACCESS_KEY` environment variable
135226

136-
### Build Errors
227+
### Bridge Connection Issues
228+
```bash
229+
# Bridge cannot connect to voice agent
230+
[ERROR] WebSocket connection error: [Errno 111] Connection refused
231+
# Solution: Ensure voice agent is running and WebSocket server is active
232+
233+
# Check WebSocket server status
234+
curl -I http://localhost:8080
137235
```
138-
AttributeError: module 'em' has no attribute 'BUFFERED_OPT'
236+
237+
### Build Issues
238+
```bash
239+
# empy version conflict
240+
AttributeError: module 'em' has no attribute 'BUFFERED_OPT'
241+
# Solution: pip install empy==3.3.4
139242
```
140-
**Solution**: Install correct empy version: `pip install empy==3.3.4`
141243

142244
## Development
143245

246+
### Package Structure
247+
- **Voice Agent**: Standalone CLI application in `scripts/`
248+
- **Bridge Node**: ROS2 integration in `coffee_voice_agent/`
249+
- **Launch Files**: System orchestration in `launch/`
250+
144251
### Adding New Features
145-
1. Modify `livekit_voice_agent.py` for voice agent functionality
146-
2. Modify `voice_agent_node.py` for ROS2 integration
147-
3. Add new topics/services as needed
252+
1. **Voice functionality**: Modify `livekit_voice_agent.py`
253+
2. **ROS2 integration**: Modify `voice_agent_bridge.py`
254+
3. **System integration**: Update launch files
148255

149-
### Testing
256+
### Testing Components
150257
```bash
151-
# Test basic ROS2 functionality
152-
ros2 node info /coffee_voice_agent
258+
# Test voice agent directly
259+
./scripts/run_voice_agent.sh
153260

154-
# Test topic publishing
155-
ros2 topic hz /coffee_voice_agent/state
261+
# Test bridge connection
262+
ros2 run coffee_voice_agent voice_agent_bridge
156263

157-
# Test virtual requests
158-
ros2 topic pub /coffee_voice_agent/virtual_request std_msgs/String '{"data": "test"}'
264+
# Test complete system
265+
ros2 launch coffee_voice_agent voice_agent_system.launch.py
159266
```
160267

268+
## Design Principles
269+
270+
- **Separation of Concerns**: Voice processing vs. system integration
271+
- **Console Mode Priority**: Interactive functionality preserved
272+
- **Clean Architecture**: WebSocket bridge avoids threading conflicts
273+
- **Flexibility**: Can run components separately or together
274+
- **ROS2 Native**: Bridge follows ROS2 patterns and conventions
275+
161276
## Future Enhancements
162277

163-
- [ ] Add coffee machine control integration
164-
- [ ] Add behavior tree integration
165-
- [ ] Add face recognition integration
166-
- [ ] Add gesture control
167-
- [ ] Add configuration via ROS2 parameters
168-
- [ ] Add diagnostics and monitoring
278+
- [ ] Add service interfaces for synchronous voice agent control
279+
- [ ] Add parameter server integration for dynamic configuration
280+
- [ ] Add diagnostics and health monitoring
281+
- [ ] Add audio stream bridging for ROS2 audio topics
282+
- [ ] Add behavior tree integration for complex interaction flows

0 commit comments

Comments
 (0)