|
| 1 | +# Coffee Voice Agent Messages |
| 2 | + |
| 3 | +This package contains ROS2 message definitions for the Coffee Robot's voice agent system. These messages provide a structured interface for communication between the voice agent and other robot components, particularly the orchestrator node that coordinates animatronics based on voice agent state. |
| 4 | + |
| 5 | +## Architecture |
| 6 | + |
| 7 | +The voice agent publishes state information through these message types, which are consumed by an orchestrator node that synthesizes this information with other sensor inputs (face tracking, hardware status) to coordinate robot expressions and movements. |
| 8 | + |
| 9 | +``` |
| 10 | +Voice Agent → Voice Agent Bridge → ROS2 Topics → Orchestrator → Hardware Controllers |
| 11 | +``` |
| 12 | + |
| 13 | +## Message Types |
| 14 | + |
| 15 | +### TtsEvent.msg |
| 16 | + |
| 17 | +Published when the voice agent starts or stops speaking. Critical for synchronizing robot animations with speech. |
| 18 | + |
| 19 | +``` |
| 20 | +# Event type: "started" or "finished" |
| 21 | +string event |
| 22 | +
|
| 23 | +# Current emotion being expressed |
| 24 | +string emotion |
| 25 | +
|
| 26 | +# Preview of the text being spoken (truncated for large text) |
| 27 | +string text |
| 28 | +
|
| 29 | +# Source of the TTS: "manual", "llm", "announcement", etc. |
| 30 | +string source |
| 31 | +
|
| 32 | +# Timestamp when the event occurred |
| 33 | +builtin_interfaces/Time timestamp |
| 34 | +``` |
| 35 | + |
| 36 | +**Topic**: `voice_agent/tts_events` |
| 37 | + |
| 38 | +**Usage Example**: |
| 39 | +```python |
| 40 | +# When agent starts speaking with excitement |
| 41 | +{ |
| 42 | + "event": "started", |
| 43 | + "emotion": "excited", |
| 44 | + "text": "Hello! How can I help you today?", |
| 45 | + "source": "manual", |
| 46 | + "timestamp": "..." |
| 47 | +} |
| 48 | + |
| 49 | +# When agent finishes speaking |
| 50 | +{ |
| 51 | + "event": "finished", |
| 52 | + "emotion": "excited", |
| 53 | + "text": "Hello! How can I help you today?", |
| 54 | + "source": "manual", |
| 55 | + "timestamp": "..." |
| 56 | +} |
| 57 | +``` |
| 58 | + |
| 59 | +### AgentState.msg |
| 60 | + |
| 61 | +Published when the voice agent changes conversation states. Useful for high-level behavioral coordination. |
| 62 | + |
| 63 | +``` |
| 64 | +# Current state: "dormant", "connecting", "active", "speaking", "disconnecting" |
| 65 | +string current_state |
| 66 | +
|
| 67 | +# Previous state for transition tracking |
| 68 | +string previous_state |
| 69 | +
|
| 70 | +# Timestamp when the state change occurred |
| 71 | +builtin_interfaces/Time timestamp |
| 72 | +``` |
| 73 | + |
| 74 | +**Topic**: `voice_agent/state` |
| 75 | + |
| 76 | +**State Transitions**: |
| 77 | +- `dormant` → `connecting` (wake word detected) |
| 78 | +- `connecting` → `active` (session established) |
| 79 | +- `active` → `speaking` (agent is talking) |
| 80 | +- `speaking` → `active` (agent finished talking) |
| 81 | +- `active` → `disconnecting` (conversation ending) |
| 82 | +- `disconnecting` → `dormant` (back to wake word detection) |
| 83 | + |
| 84 | +### EmotionState.msg |
| 85 | + |
| 86 | +Published when the voice agent's emotional expression changes. Used by orchestrator to coordinate eye expressions and ear movements. |
| 87 | + |
| 88 | +``` |
| 89 | +# Current emotion: "friendly", "excited", "curious", "sleepy", etc. |
| 90 | +string emotion |
| 91 | +
|
| 92 | +# Previous emotion for smooth transition tracking |
| 93 | +string previous_emotion |
| 94 | +
|
| 95 | +# Timestamp when the emotion change occurred |
| 96 | +builtin_interfaces/Time timestamp |
| 97 | +``` |
| 98 | + |
| 99 | +**Topic**: `voice_agent/emotion` |
| 100 | + |
| 101 | +**Supported Emotions**: |
| 102 | +- `friendly` - Default warm expression |
| 103 | +- `excited` - High energy, enthusiastic |
| 104 | +- `curious` - Inquisitive, attentive |
| 105 | +- `sleepy` - Low energy, tired |
| 106 | +- `waiting` - Patient, expectant |
| 107 | +- `excuse` - Apologetic, polite interruption |
| 108 | + |
| 109 | +### ConversationItem.msg |
| 110 | + |
| 111 | +Published for each turn in the conversation transcript. Provides context for engagement-based animations. |
| 112 | + |
| 113 | +``` |
| 114 | +# Speaker role: "user" or "assistant" |
| 115 | +string role |
| 116 | +
|
| 117 | +# The spoken/generated text content |
| 118 | +string text |
| 119 | +
|
| 120 | +# Timestamp when this conversation item was added |
| 121 | +builtin_interfaces/Time timestamp |
| 122 | +``` |
| 123 | + |
| 124 | +**Topic**: `voice_agent/conversation` |
| 125 | + |
| 126 | +## Integration Examples |
| 127 | + |
| 128 | +### Orchestrator Usage |
| 129 | + |
| 130 | +```python |
| 131 | +# Subscribe to TTS events for animation coordination |
| 132 | +def tts_event_callback(msg): |
| 133 | + if msg.event == "started": |
| 134 | + # Coordinate: ears perk up, eyes show emotion, head oriented to user |
| 135 | + trigger_speech_animation(msg.emotion) |
| 136 | + elif msg.event == "finished": |
| 137 | + # Return to neutral state, allow head movement |
| 138 | + return_to_neutral() |
| 139 | + |
| 140 | +# Subscribe to state changes for behavioral modes |
| 141 | +def state_change_callback(msg): |
| 142 | + if msg.current_state == "dormant": |
| 143 | + # Enable idle animations, wake word detection ready |
| 144 | + enable_idle_mode() |
| 145 | + elif msg.current_state == "active": |
| 146 | + # Focus on user, ready for interaction |
| 147 | + enable_conversation_mode() |
| 148 | +``` |
| 149 | + |
| 150 | +### Message Dependencies |
| 151 | + |
| 152 | +This package depends on: |
| 153 | +- `std_msgs` - Standard ROS message types |
| 154 | +- `builtin_interfaces` - Time stamping |
| 155 | + |
| 156 | +## Building |
| 157 | + |
| 158 | +```bash |
| 159 | +# Build this package |
| 160 | +colcon build --packages-select coffee_voice_agent_msgs |
| 161 | + |
| 162 | +# Source the workspace |
| 163 | +source install/setup.bash |
| 164 | +``` |
| 165 | + |
| 166 | +## Usage in Other Packages |
| 167 | + |
| 168 | +To use these messages in other ROS2 packages: |
| 169 | + |
| 170 | +1. Add dependency in `package.xml`: |
| 171 | +```xml |
| 172 | +<depend>coffee_voice_agent_msgs</depend> |
| 173 | +``` |
| 174 | + |
| 175 | +2. Add to `CMakeLists.txt`: |
| 176 | +```cmake |
| 177 | +find_package(coffee_voice_agent_msgs REQUIRED) |
| 178 | +``` |
| 179 | + |
| 180 | +3. Import in Python: |
| 181 | +```python |
| 182 | +from coffee_voice_agent_msgs.msg import TtsEvent, AgentState, EmotionState, ConversationItem |
| 183 | +``` |
| 184 | + |
| 185 | +## Topics Published by Voice Agent Bridge |
| 186 | + |
| 187 | +| Topic | Message Type | Description | |
| 188 | +|-------|-------------|-------------| |
| 189 | +| `voice_agent/tts_events` | `TtsEvent` | Speech start/stop with emotion | |
| 190 | +| `voice_agent/state` | `AgentState` | Conversation state changes | |
| 191 | +| `voice_agent/emotion` | `EmotionState` | Emotional expression changes | |
| 192 | +| `voice_agent/conversation` | `ConversationItem` | Conversation transcript | |
| 193 | +| `voice_agent/connected` | `std_msgs/Bool` | WebSocket connection status | |
| 194 | + |
| 195 | +## Design Principles |
| 196 | + |
| 197 | +- **Separation of Concerns**: Voice agent publishes state only; orchestrator makes hardware decisions |
| 198 | +- **Type Safety**: Structured messages prevent JSON parsing errors |
| 199 | +- **Timestamps**: All events include timing for coordination |
| 200 | +- **Transition Context**: Previous states/emotions enable smooth animations |
| 201 | +- **Rich Context**: Emotion + text content enables intelligent coordination decisions |
0 commit comments