Skip to content

Commit 46debb4

Browse files
committed
Add README for voice_agent_msgs
1 parent ba5b1ba commit 46debb4

File tree

1 file changed

+201
-0
lines changed
  • coffee_ws/src/coffee_voice_agent_msgs

1 file changed

+201
-0
lines changed
Lines changed: 201 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,201 @@
1+
# Coffee Voice Agent Messages
2+
3+
This package contains ROS2 message definitions for the Coffee Robot's voice agent system. These messages provide a structured interface for communication between the voice agent and other robot components, particularly the orchestrator node that coordinates animatronics based on voice agent state.
4+
5+
## Architecture
6+
7+
The voice agent publishes state information through these message types, which are consumed by an orchestrator node that synthesizes this information with other sensor inputs (face tracking, hardware status) to coordinate robot expressions and movements.
8+
9+
```
10+
Voice Agent → Voice Agent Bridge → ROS2 Topics → Orchestrator → Hardware Controllers
11+
```
12+
13+
## Message Types
14+
15+
### TtsEvent.msg
16+
17+
Published when the voice agent starts or stops speaking. Critical for synchronizing robot animations with speech.
18+
19+
```
20+
# Event type: "started" or "finished"
21+
string event
22+
23+
# Current emotion being expressed
24+
string emotion
25+
26+
# Preview of the text being spoken (truncated for large text)
27+
string text
28+
29+
# Source of the TTS: "manual", "llm", "announcement", etc.
30+
string source
31+
32+
# Timestamp when the event occurred
33+
builtin_interfaces/Time timestamp
34+
```
35+
36+
**Topic**: `voice_agent/tts_events`
37+
38+
**Usage Example**:
39+
```python
40+
# When agent starts speaking with excitement
41+
{
42+
"event": "started",
43+
"emotion": "excited",
44+
"text": "Hello! How can I help you today?",
45+
"source": "manual",
46+
"timestamp": "..."
47+
}
48+
49+
# When agent finishes speaking
50+
{
51+
"event": "finished",
52+
"emotion": "excited",
53+
"text": "Hello! How can I help you today?",
54+
"source": "manual",
55+
"timestamp": "..."
56+
}
57+
```
58+
59+
### AgentState.msg
60+
61+
Published when the voice agent changes conversation states. Useful for high-level behavioral coordination.
62+
63+
```
64+
# Current state: "dormant", "connecting", "active", "speaking", "disconnecting"
65+
string current_state
66+
67+
# Previous state for transition tracking
68+
string previous_state
69+
70+
# Timestamp when the state change occurred
71+
builtin_interfaces/Time timestamp
72+
```
73+
74+
**Topic**: `voice_agent/state`
75+
76+
**State Transitions**:
77+
- `dormant``connecting` (wake word detected)
78+
- `connecting``active` (session established)
79+
- `active``speaking` (agent is talking)
80+
- `speaking``active` (agent finished talking)
81+
- `active``disconnecting` (conversation ending)
82+
- `disconnecting``dormant` (back to wake word detection)
83+
84+
### EmotionState.msg
85+
86+
Published when the voice agent's emotional expression changes. Used by orchestrator to coordinate eye expressions and ear movements.
87+
88+
```
89+
# Current emotion: "friendly", "excited", "curious", "sleepy", etc.
90+
string emotion
91+
92+
# Previous emotion for smooth transition tracking
93+
string previous_emotion
94+
95+
# Timestamp when the emotion change occurred
96+
builtin_interfaces/Time timestamp
97+
```
98+
99+
**Topic**: `voice_agent/emotion`
100+
101+
**Supported Emotions**:
102+
- `friendly` - Default warm expression
103+
- `excited` - High energy, enthusiastic
104+
- `curious` - Inquisitive, attentive
105+
- `sleepy` - Low energy, tired
106+
- `waiting` - Patient, expectant
107+
- `excuse` - Apologetic, polite interruption
108+
109+
### ConversationItem.msg
110+
111+
Published for each turn in the conversation transcript. Provides context for engagement-based animations.
112+
113+
```
114+
# Speaker role: "user" or "assistant"
115+
string role
116+
117+
# The spoken/generated text content
118+
string text
119+
120+
# Timestamp when this conversation item was added
121+
builtin_interfaces/Time timestamp
122+
```
123+
124+
**Topic**: `voice_agent/conversation`
125+
126+
## Integration Examples
127+
128+
### Orchestrator Usage
129+
130+
```python
131+
# Subscribe to TTS events for animation coordination
132+
def tts_event_callback(msg):
133+
if msg.event == "started":
134+
# Coordinate: ears perk up, eyes show emotion, head oriented to user
135+
trigger_speech_animation(msg.emotion)
136+
elif msg.event == "finished":
137+
# Return to neutral state, allow head movement
138+
return_to_neutral()
139+
140+
# Subscribe to state changes for behavioral modes
141+
def state_change_callback(msg):
142+
if msg.current_state == "dormant":
143+
# Enable idle animations, wake word detection ready
144+
enable_idle_mode()
145+
elif msg.current_state == "active":
146+
# Focus on user, ready for interaction
147+
enable_conversation_mode()
148+
```
149+
150+
### Message Dependencies
151+
152+
This package depends on:
153+
- `std_msgs` - Standard ROS message types
154+
- `builtin_interfaces` - Time stamping
155+
156+
## Building
157+
158+
```bash
159+
# Build this package
160+
colcon build --packages-select coffee_voice_agent_msgs
161+
162+
# Source the workspace
163+
source install/setup.bash
164+
```
165+
166+
## Usage in Other Packages
167+
168+
To use these messages in other ROS2 packages:
169+
170+
1. Add dependency in `package.xml`:
171+
```xml
172+
<depend>coffee_voice_agent_msgs</depend>
173+
```
174+
175+
2. Add to `CMakeLists.txt`:
176+
```cmake
177+
find_package(coffee_voice_agent_msgs REQUIRED)
178+
```
179+
180+
3. Import in Python:
181+
```python
182+
from coffee_voice_agent_msgs.msg import TtsEvent, AgentState, EmotionState, ConversationItem
183+
```
184+
185+
## Topics Published by Voice Agent Bridge
186+
187+
| Topic | Message Type | Description |
188+
|-------|-------------|-------------|
189+
| `voice_agent/tts_events` | `TtsEvent` | Speech start/stop with emotion |
190+
| `voice_agent/state` | `AgentState` | Conversation state changes |
191+
| `voice_agent/emotion` | `EmotionState` | Emotional expression changes |
192+
| `voice_agent/conversation` | `ConversationItem` | Conversation transcript |
193+
| `voice_agent/connected` | `std_msgs/Bool` | WebSocket connection status |
194+
195+
## Design Principles
196+
197+
- **Separation of Concerns**: Voice agent publishes state only; orchestrator makes hardware decisions
198+
- **Type Safety**: Structured messages prevent JSON parsing errors
199+
- **Timestamps**: All events include timing for coordination
200+
- **Transition Context**: Previous states/emotions enable smooth animations
201+
- **Rich Context**: Emotion + text content enables intelligent coordination decisions

0 commit comments

Comments
 (0)