Skip to content

Commit 8b48bc6

Browse files
committed
Update README
1 parent da40f36 commit 8b48bc6

File tree

1 file changed

+159
-12
lines changed

1 file changed

+159
-12
lines changed

coffee_ws/src/coffee_voice_agent/README.md

Lines changed: 159 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,67 @@ A ROS2 package that integrates the LiveKit Coffee Barista Voice Agent with the C
66

77
This package provides ROS2 integration for the Coffee Barista Voice Agent while preserving its interactive console mode functionality. The voice agent runs as a standalone application with full console controls, while a separate ROS2 bridge node provides system integration via WebSocket communication.
88

9+
## Implementation Versions
10+
11+
This package now includes **two implementations** of the voice agent:
12+
13+
### **🏗️ Refactored Version (Recommended)**
14+
- **Files**: `main.py` + modular structure (`state/`, `agents/`, `tools/`)
15+
- **Launcher**: `./run_main.sh`
16+
- **Architecture**: Clean file-based modular organization
17+
- **Benefits**: Better maintainability, easier testing, cleaner separation of concerns
18+
- **Status**: ✅ **Production ready** - Same functionality with better organization
19+
20+
### **📚 Original Version (Reference)**
21+
- **Files**: `livekit_voice_agent.py` (monolithic, 1109 lines)
22+
- **Launcher**: `./run_voice_agent_original.sh`
23+
- **Architecture**: Single-file implementation
24+
- **Benefits**: Proven, stable, all logic in one place
25+
- **Status**: 📖 **Preserved for reference** - Fully functional but less maintainable
26+
27+
**Both implementations provide identical functionality** - choose based on your preference for code organization.
28+
29+
## Refactoring Details
30+
31+
The refactored version was created through careful **file-based modular extraction** while preserving all original functionality:
32+
33+
### **What Was Extracted**
34+
35+
| **Component** | **Original Location** | **New Location** | **Lines** | **Purpose** |
36+
|---------------|----------------------|------------------|-----------|-------------|
37+
| **StateManager** | Lines 40-566 in monolith | `state/state_manager.py` | 540 | State transitions, timeouts, virtual queue |
38+
| **CoffeeBaristaAgent** | Lines 568-1039 in monolith | `agents/simple_coffee_agent.py` | 358 | I/O services, TTS, wake word, WebSocket |
39+
| **Function Tools** | Agent methods | `tools/coffee_tools.py` | 82 | Coffee-related functions (menu, time, etc.) |
40+
| **Configuration** | Scattered constants | `config/settings.py` | 25 | Environment variables, timeouts |
41+
| **Instructions** | Large string | `config/instructions.py` | 50 | LLM system prompt |
42+
| **Utilities** | Helper methods | `utils/*.py` | 150 | Greetings, animations, announcements |
43+
44+
### **Key Improvements**
45+
46+
- **🧩 Modular Design**: 1109-line monolith → 7 focused files
47+
- **🔧 Clean Tool Registration**: Programmatic `function_tool()` registration vs duplicate methods
48+
- **🧪 Testable Components**: Each class can be unit tested independently
49+
- **📝 Maintainable**: Add features by editing specific files, not searching through monolith
50+
- **⚙️ Configuration Management**: Environment variables and settings centralized
51+
- **🛠️ Reusable Utilities**: Greeting selection, animation descriptions, announcement formatting
52+
53+
### **What Was Preserved**
54+
55+
**All complex logic**: State management, timeout handling, virtual request batching
56+
**Threading model**: Same 3-thread architecture (main, wake word, WebSocket)
57+
**Session events**: Conversation flow, goodbye detection, timer management
58+
**TTS processing**: Emotion extraction from `emotion:text` format
59+
**Resource management**: Proper cleanup, state transitions, error handling
60+
**Behavior**: Identical user experience and functionality
61+
62+
### **Refactoring Principles**
63+
64+
- **No Logic Changes**: Pure organizational refactoring, zero behavior modification
65+
- **Composition over Services**: Avoided over-engineering with service abstractions
66+
- **Single Responsibility**: Each file has a clear, focused purpose
67+
- **Dependency Injection**: Components accept dependencies for better testing
68+
- **Proven Patterns**: Used established LiveKit patterns (programmatic tool registration)
69+
970
## Features
1071

1172
- **🎙️ Wake Word Detection**: "Hey barista" activation with Porcupine
@@ -18,18 +79,42 @@ This package provides ROS2 integration for the Coffee Barista Voice Agent while
1879

1980
## Architecture
2081

82+
### **File Structure**
2183
```
2284
coffee_voice_agent/
2385
├── scripts/
24-
│ ├── livekit_voice_agent.py # Original LiveKit voice agent
25-
│ └── run_voice_agent.sh # Smart bash launcher
86+
│ ├── main.py # 🏗️ Refactored voice agent entry point
87+
│ ├── run_main.sh # 🏗️ Refactored version launcher
88+
│ ├── livekit_voice_agent.py # 📚 Original monolithic voice agent
89+
│ ├── run_voice_agent_original.sh # 📚 Original version launcher
90+
│ ├── state/
91+
│ │ └── state_manager.py # 🏗️ Extracted StateManager (540 lines)
92+
│ ├── agents/
93+
│ │ └── simple_coffee_agent.py # 🏗️ CoffeeBaristaAgent with programmatic tools
94+
│ ├── tools/
95+
│ │ └── coffee_tools.py # 🏗️ Function tool implementations
96+
│ ├── config/
97+
│ │ ├── settings.py # 🏗️ Configuration and environment variables
98+
│ │ └── instructions.py # 🏗️ LLM system instructions
99+
│ └── utils/
100+
│ ├── greeting_data.py # 🏗️ Greeting utilities
101+
│ ├── animation_data.py # 🏗️ Eye animation descriptions
102+
│ └── announcement_data.py # 🏗️ Order announcement templates
26103
├── coffee_voice_agent/
27104
│ └── voice_agent_bridge.py # ROS2 bridge node
28105
└── launch/
29106
├── voice_agent_bridge.launch.py # Bridge only
30107
└── voice_agent_system.launch.py # Voice agent + bridge together
31108
```
32109

110+
### **Refactored Architecture Benefits**
111+
- **🧩 Modular**: StateManager (540 lines) separate from Agent (358 lines)
112+
- **🔧 Clean Tools**: Programmatic function registration, no code duplication
113+
- **⚙️ Configuration**: Environment variables and instructions extracted
114+
- **🛠️ Utilities**: Reusable components for greetings, animations, announcements
115+
- **🧪 Testable**: Each component can be tested independently
116+
- **📝 Maintainable**: Easy to add/remove features, clear responsibilities
117+
33118
### Communication Flow
34119
```
35120
┌─────────────────┐ WebSocket ┌─────────────────┐ ROS2 ┌──────────────┐
@@ -76,20 +161,43 @@ source install/setup.bash
76161

77162
### 2. Run Voice Agent (Console Mode)
78163

79-
**Primary Method - Direct Execution:**
164+
#### **🏗️ Refactored Version (Recommended)**
80165
```bash
81-
# Run directly for full console mode with interactive controls
82-
./src/coffee_voice_agent/scripts/run_voice_agent.sh
166+
# Run refactored version with modular architecture
167+
./src/coffee_voice_agent/scripts/run_main.sh
83168

84169
# Or after building:
85-
./install/coffee_voice_agent/share/coffee_voice_agent/scripts/run_voice_agent.sh
170+
./install/coffee_voice_agent/share/coffee_voice_agent/scripts/run_main.sh
86171
```
87172

88-
**Console Controls:**
173+
#### **📚 Original Version (Reference)**
174+
```bash
175+
# Run original monolithic version
176+
./src/coffee_voice_agent/scripts/run_voice_agent_original.sh
177+
178+
# Or after building:
179+
./install/coffee_voice_agent/share/coffee_voice_agent/scripts/run_voice_agent_original.sh
180+
```
181+
182+
**Console Controls (Both Versions):**
89183
- `[Ctrl+B]` - Toggle between Text/Audio mode
90184
- `[Q]` - Quit the application
91185
- Wake word: Say **"hey barista"** to activate
92186

187+
**Which Version to Use?**
188+
- **🏗️ Use refactored version** (`./run_main.sh`) for new development, easier maintenance, better testing
189+
- **📚 Use original version** (`./run_voice_agent_original.sh`) if you prefer single-file simplicity or need proven stability
190+
191+
### **Quick Start Guide**
192+
193+
```bash
194+
# 🚀 RECOMMENDED: Run refactored modular version
195+
./run_main.sh
196+
197+
# 📚 REFERENCE: Run original monolithic version
198+
./run_voice_agent_original.sh
199+
```
200+
93201
### 3. ROS2 Integration (Optional)
94202

95203
**Bridge Only** (if voice agent running separately):
@@ -272,19 +380,46 @@ AttributeError: module 'em' has no attribute 'BUFFERED_OPT'
272380
## Development
273381

274382
### Package Structure
275-
- **Voice Agent**: Standalone CLI application in `scripts/`
383+
- **🏗️ Refactored Voice Agent**: Modular structure in `scripts/` (main.py + subdirectories)
384+
- **📚 Original Voice Agent**: Monolithic implementation in `scripts/livekit_voice_agent.py`
276385
- **Bridge Node**: ROS2 integration in `coffee_voice_agent/`
277386
- **Launch Files**: System orchestration in `launch/`
278387

279388
### Adding New Features
280-
1. **Voice functionality**: Modify `livekit_voice_agent.py`
281-
2. **ROS2 integration**: Modify `voice_agent_bridge.py`
389+
390+
#### **🏗️ Refactored Version (Recommended for Development)**
391+
1. **Function Tools**: Add to `tools/coffee_tools.py` and register in `agents/simple_coffee_agent.py`
392+
2. **State Logic**: Modify `state/state_manager.py` for conversation flow changes
393+
3. **Configuration**: Update `config/settings.py` or `config/instructions.py`
394+
4. **Utilities**: Add to appropriate `utils/*.py` file
395+
5. **Agent Behavior**: Modify `agents/simple_coffee_agent.py` for I/O changes
396+
6. **ROS2 Integration**: Modify `voice_agent_bridge.py`
397+
398+
#### **📚 Original Version**
399+
1. **Voice functionality**: Modify `livekit_voice_agent.py` (search through 1109 lines)
400+
2. **ROS2 integration**: Modify `voice_agent_bridge.py`
282401
3. **System integration**: Update launch files
283402

403+
### Development Benefits - Refactored Version
404+
- **🔍 Easy Navigation**: Find features in dedicated files vs searching monolith
405+
- **🧪 Component Testing**: Test StateManager, tools, utilities independently
406+
- **🔧 Clean Changes**: Modify specific files without side effects
407+
- **📝 Code Reviews**: Smaller, focused diffs instead of large file changes
408+
- **🏗️ Parallel Development**: Multiple developers can work on different components
409+
284410
### Testing Components
411+
412+
#### **🏗️ Refactored Version**
285413
```bash
286-
# Test voice agent directly
287-
./scripts/run_voice_agent.sh
414+
# Test refactored voice agent directly
415+
./scripts/run_main.sh
416+
417+
# Test individual components (Python REPL)
418+
python3 -c "
419+
from state.state_manager import StateManager
420+
from tools.coffee_tools import get_current_time_impl
421+
# Test components independently
422+
"
288423

289424
# Test bridge connection
290425
ros2 run coffee_voice_agent voice_agent_bridge
@@ -293,6 +428,18 @@ ros2 run coffee_voice_agent voice_agent_bridge
293428
ros2 launch coffee_voice_agent voice_agent_system.launch.py
294429
```
295430

431+
#### **📚 Original Version**
432+
```bash
433+
# Test original voice agent directly
434+
./scripts/run_voice_agent_original.sh
435+
436+
# Test bridge connection
437+
ros2 run coffee_voice_agent voice_agent_bridge
438+
439+
# Test complete system
440+
ros2 launch coffee_voice_agent voice_agent_system.launch.py
441+
```
442+
296443
## Design Principles
297444

298445
- **Separation of Concerns**: Voice processing vs. system integration

0 commit comments

Comments
 (0)