@@ -6,6 +6,67 @@ A ROS2 package that integrates the LiveKit Coffee Barista Voice Agent with the C
66
77This package provides ROS2 integration for the Coffee Barista Voice Agent while preserving its interactive console mode functionality. The voice agent runs as a standalone application with full console controls, while a separate ROS2 bridge node provides system integration via WebSocket communication.
88
9+ ## Implementation Versions
10+
11+ This package now includes ** two implementations** of the voice agent:
12+
13+ ### ** 🏗️ Refactored Version (Recommended)**
14+ - ** Files** : ` main.py ` + modular structure (` state/ ` , ` agents/ ` , ` tools/ ` )
15+ - ** Launcher** : ` ./run_main.sh `
16+ - ** Architecture** : Clean file-based modular organization
17+ - ** Benefits** : Better maintainability, easier testing, cleaner separation of concerns
18+ - ** Status** : ✅ ** Production ready** - Same functionality with better organization
19+
20+ ### ** 📚 Original Version (Reference)**
21+ - ** Files** : ` livekit_voice_agent.py ` (monolithic, 1109 lines)
22+ - ** Launcher** : ` ./run_voice_agent_original.sh `
23+ - ** Architecture** : Single-file implementation
24+ - ** Benefits** : Proven, stable, all logic in one place
25+ - ** Status** : 📖 ** Preserved for reference** - Fully functional but less maintainable
26+
27+ ** Both implementations provide identical functionality** - choose based on your preference for code organization.
28+
29+ ## Refactoring Details
30+
31+ The refactored version was created through careful ** file-based modular extraction** while preserving all original functionality:
32+
33+ ### ** What Was Extracted**
34+
35+ | ** Component** | ** Original Location** | ** New Location** | ** Lines** | ** Purpose** |
36+ | ---------------| ----------------------| ------------------| -----------| -------------|
37+ | ** StateManager** | Lines 40-566 in monolith | ` state/state_manager.py ` | 540 | State transitions, timeouts, virtual queue |
38+ | ** CoffeeBaristaAgent** | Lines 568-1039 in monolith | ` agents/simple_coffee_agent.py ` | 358 | I/O services, TTS, wake word, WebSocket |
39+ | ** Function Tools** | Agent methods | ` tools/coffee_tools.py ` | 82 | Coffee-related functions (menu, time, etc.) |
40+ | ** Configuration** | Scattered constants | ` config/settings.py ` | 25 | Environment variables, timeouts |
41+ | ** Instructions** | Large string | ` config/instructions.py ` | 50 | LLM system prompt |
42+ | ** Utilities** | Helper methods | ` utils/*.py ` | 150 | Greetings, animations, announcements |
43+
44+ ### ** Key Improvements**
45+
46+ - ** 🧩 Modular Design** : 1109-line monolith → 7 focused files
47+ - ** 🔧 Clean Tool Registration** : Programmatic ` function_tool() ` registration vs duplicate methods
48+ - ** 🧪 Testable Components** : Each class can be unit tested independently
49+ - ** 📝 Maintainable** : Add features by editing specific files, not searching through monolith
50+ - ** ⚙️ Configuration Management** : Environment variables and settings centralized
51+ - ** 🛠️ Reusable Utilities** : Greeting selection, animation descriptions, announcement formatting
52+
53+ ### ** What Was Preserved**
54+
55+ ✅ ** All complex logic** : State management, timeout handling, virtual request batching
56+ ✅ ** Threading model** : Same 3-thread architecture (main, wake word, WebSocket)
57+ ✅ ** Session events** : Conversation flow, goodbye detection, timer management
58+ ✅ ** TTS processing** : Emotion extraction from ` emotion:text ` format
59+ ✅ ** Resource management** : Proper cleanup, state transitions, error handling
60+ ✅ ** Behavior** : Identical user experience and functionality
61+
62+ ### ** Refactoring Principles**
63+
64+ - ** No Logic Changes** : Pure organizational refactoring, zero behavior modification
65+ - ** Composition over Services** : Avoided over-engineering with service abstractions
66+ - ** Single Responsibility** : Each file has a clear, focused purpose
67+ - ** Dependency Injection** : Components accept dependencies for better testing
68+ - ** Proven Patterns** : Used established LiveKit patterns (programmatic tool registration)
69+
970## Features
1071
1172- ** 🎙️ Wake Word Detection** : "Hey barista" activation with Porcupine
@@ -18,18 +79,42 @@ This package provides ROS2 integration for the Coffee Barista Voice Agent while
1879
1980## Architecture
2081
82+ ### ** File Structure**
2183```
2284coffee_voice_agent/
2385├── scripts/
24- │ ├── livekit_voice_agent.py # Original LiveKit voice agent
25- │ └── run_voice_agent.sh # Smart bash launcher
86+ │ ├── main.py # 🏗️ Refactored voice agent entry point
87+ │ ├── run_main.sh # 🏗️ Refactored version launcher
88+ │ ├── livekit_voice_agent.py # 📚 Original monolithic voice agent
89+ │ ├── run_voice_agent_original.sh # 📚 Original version launcher
90+ │ ├── state/
91+ │ │ └── state_manager.py # 🏗️ Extracted StateManager (540 lines)
92+ │ ├── agents/
93+ │ │ └── simple_coffee_agent.py # 🏗️ CoffeeBaristaAgent with programmatic tools
94+ │ ├── tools/
95+ │ │ └── coffee_tools.py # 🏗️ Function tool implementations
96+ │ ├── config/
97+ │ │ ├── settings.py # 🏗️ Configuration and environment variables
98+ │ │ └── instructions.py # 🏗️ LLM system instructions
99+ │ └── utils/
100+ │ ├── greeting_data.py # 🏗️ Greeting utilities
101+ │ ├── animation_data.py # 🏗️ Eye animation descriptions
102+ │ └── announcement_data.py # 🏗️ Order announcement templates
26103├── coffee_voice_agent/
27104│ └── voice_agent_bridge.py # ROS2 bridge node
28105└── launch/
29106 ├── voice_agent_bridge.launch.py # Bridge only
30107 └── voice_agent_system.launch.py # Voice agent + bridge together
31108```
32109
110+ ### ** Refactored Architecture Benefits**
111+ - ** 🧩 Modular** : StateManager (540 lines) separate from Agent (358 lines)
112+ - ** 🔧 Clean Tools** : Programmatic function registration, no code duplication
113+ - ** ⚙️ Configuration** : Environment variables and instructions extracted
114+ - ** 🛠️ Utilities** : Reusable components for greetings, animations, announcements
115+ - ** 🧪 Testable** : Each component can be tested independently
116+ - ** 📝 Maintainable** : Easy to add/remove features, clear responsibilities
117+
33118### Communication Flow
34119```
35120┌─────────────────┐ WebSocket ┌─────────────────┐ ROS2 ┌──────────────┐
@@ -76,20 +161,43 @@ source install/setup.bash
76161
77162### 2. Run Voice Agent (Console Mode)
78163
79- ** Primary Method - Direct Execution: **
164+ #### ** 🏗️ Refactored Version (Recommended) **
80165``` bash
81- # Run directly for full console mode with interactive controls
82- ./src/coffee_voice_agent/scripts/run_voice_agent .sh
166+ # Run refactored version with modular architecture
167+ ./src/coffee_voice_agent/scripts/run_main .sh
83168
84169# Or after building:
85- ./install/coffee_voice_agent/share/coffee_voice_agent/scripts/run_voice_agent .sh
170+ ./install/coffee_voice_agent/share/coffee_voice_agent/scripts/run_main .sh
86171```
87172
88- ** Console Controls:**
173+ #### ** 📚 Original Version (Reference)**
174+ ``` bash
175+ # Run original monolithic version
176+ ./src/coffee_voice_agent/scripts/run_voice_agent_original.sh
177+
178+ # Or after building:
179+ ./install/coffee_voice_agent/share/coffee_voice_agent/scripts/run_voice_agent_original.sh
180+ ```
181+
182+ ** Console Controls (Both Versions):**
89183- ` [Ctrl+B] ` - Toggle between Text/Audio mode
90184- ` [Q] ` - Quit the application
91185- Wake word: Say ** "hey barista"** to activate
92186
187+ ** Which Version to Use?**
188+ - ** 🏗️ Use refactored version** (` ./run_main.sh ` ) for new development, easier maintenance, better testing
189+ - ** 📚 Use original version** (` ./run_voice_agent_original.sh ` ) if you prefer single-file simplicity or need proven stability
190+
191+ ### ** Quick Start Guide**
192+
193+ ``` bash
194+ # 🚀 RECOMMENDED: Run refactored modular version
195+ ./run_main.sh
196+
197+ # 📚 REFERENCE: Run original monolithic version
198+ ./run_voice_agent_original.sh
199+ ```
200+
93201### 3. ROS2 Integration (Optional)
94202
95203** Bridge Only** (if voice agent running separately):
@@ -272,19 +380,46 @@ AttributeError: module 'em' has no attribute 'BUFFERED_OPT'
272380## Development
273381
274382### Package Structure
275- - ** Voice Agent** : Standalone CLI application in ` scripts/ `
383+ - ** 🏗️ Refactored Voice Agent** : Modular structure in ` scripts/ ` (main.py + subdirectories)
384+ - ** 📚 Original Voice Agent** : Monolithic implementation in ` scripts/livekit_voice_agent.py `
276385- ** Bridge Node** : ROS2 integration in ` coffee_voice_agent/ `
277386- ** Launch Files** : System orchestration in ` launch/ `
278387
279388### Adding New Features
280- 1 . ** Voice functionality** : Modify ` livekit_voice_agent.py `
281- 2 . ** ROS2 integration** : Modify ` voice_agent_bridge.py `
389+
390+ #### ** 🏗️ Refactored Version (Recommended for Development)**
391+ 1 . ** Function Tools** : Add to ` tools/coffee_tools.py ` and register in ` agents/simple_coffee_agent.py `
392+ 2 . ** State Logic** : Modify ` state/state_manager.py ` for conversation flow changes
393+ 3 . ** Configuration** : Update ` config/settings.py ` or ` config/instructions.py `
394+ 4 . ** Utilities** : Add to appropriate ` utils/*.py ` file
395+ 5 . ** Agent Behavior** : Modify ` agents/simple_coffee_agent.py ` for I/O changes
396+ 6 . ** ROS2 Integration** : Modify ` voice_agent_bridge.py `
397+
398+ #### ** 📚 Original Version**
399+ 1 . ** Voice functionality** : Modify ` livekit_voice_agent.py ` (search through 1109 lines)
400+ 2 . ** ROS2 integration** : Modify ` voice_agent_bridge.py `
2824013 . ** System integration** : Update launch files
283402
403+ ### Development Benefits - Refactored Version
404+ - ** 🔍 Easy Navigation** : Find features in dedicated files vs searching monolith
405+ - ** 🧪 Component Testing** : Test StateManager, tools, utilities independently
406+ - ** 🔧 Clean Changes** : Modify specific files without side effects
407+ - ** 📝 Code Reviews** : Smaller, focused diffs instead of large file changes
408+ - ** 🏗️ Parallel Development** : Multiple developers can work on different components
409+
284410### Testing Components
411+
412+ #### ** 🏗️ Refactored Version**
285413``` bash
286- # Test voice agent directly
287- ./scripts/run_voice_agent.sh
414+ # Test refactored voice agent directly
415+ ./scripts/run_main.sh
416+
417+ # Test individual components (Python REPL)
418+ python3 -c "
419+ from state.state_manager import StateManager
420+ from tools.coffee_tools import get_current_time_impl
421+ # Test components independently
422+ "
288423
289424# Test bridge connection
290425ros2 run coffee_voice_agent voice_agent_bridge
@@ -293,6 +428,18 @@ ros2 run coffee_voice_agent voice_agent_bridge
293428ros2 launch coffee_voice_agent voice_agent_system.launch.py
294429```
295430
431+ #### ** 📚 Original Version**
432+ ``` bash
433+ # Test original voice agent directly
434+ ./scripts/run_voice_agent_original.sh
435+
436+ # Test bridge connection
437+ ros2 run coffee_voice_agent voice_agent_bridge
438+
439+ # Test complete system
440+ ros2 launch coffee_voice_agent voice_agent_system.launch.py
441+ ```
442+
296443## Design Principles
297444
298445- ** Separation of Concerns** : Voice processing vs. system integration
0 commit comments