Skip to content

Commit 94a63e7

Browse files
committed
Add README file
1 parent 9e60682 commit 94a63e7

File tree

1 file changed

+348
-0
lines changed

1 file changed

+348
-0
lines changed
Lines changed: 348 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,348 @@
1+
# Effector Nodes
2+
3+
A ROS2 package providing Text-to-Speech (TTS) effector nodes for the Coffee Buddy robot system using the ElevenLabs API.
4+
5+
## Overview
6+
7+
The `effector_nodes` package serves as the primary TTS effector in the Coffee Buddy robot ecosystem. It provides high-quality, multilingual text-to-speech synthesis through the ElevenLabs API, enabling the robot to communicate naturally with users through voice responses.
8+
9+
## Features
10+
11+
- **🎙️ High-Quality TTS**: Uses ElevenLabs API for professional-grade voice synthesis
12+
- **🌍 Multilingual Support**: Built-in support for multiple languages via ElevenLabs multilingual models
13+
- **🔄 Streaming Audio**: Direct audio streaming from API to speakers for low-latency response
14+
- **⏱️ Smart Cooldown**: Prevents audio overlap with configurable cooldown periods
15+
- **📊 State Management**: Real-time broadcasting of TTS and audio playback states
16+
- **🔧 Configurable**: Fully configurable voice, model, and audio settings via ROS2 parameters
17+
- **🛡️ Thread-Safe**: Concurrent audio processing with proper resource management
18+
- **📈 Health Monitoring**: Regular status publishing for system monitoring
19+
20+
## Architecture
21+
22+
### ROS2 Interface
23+
24+
**Services:**
25+
- `/system/effector/tts/tts_query` (coffee_buddy_msgs/TTSQuery)
26+
- Request: `string text`
27+
- Response: `bool success`
28+
29+
**Topics Published:**
30+
- `/system/effector/tts/status` (std_msgs/String) - JSON status with health info
31+
- `tts/audio_state` (std_msgs/String) - Audio playback state ('playing', 'cooldown', 'done')
32+
33+
### Integration
34+
35+
The TTS node integrates seamlessly with the Coffee Buddy voice pipeline:
36+
37+
```
38+
VoiceIntentNode → LanguageModelProcessorNode → TTSNode → Audio Output
39+
```
40+
41+
## Installation
42+
43+
### Prerequisites
44+
45+
1. **ROS2 Installation**: Requires ROS2 Humble or newer
46+
2. **System Dependencies**:
47+
```bash
48+
sudo apt-get install python3-pyaudio portaudio19-dev
49+
```
50+
51+
3. **ElevenLabs API Key**: Sign up at [ElevenLabs](https://elevenlabs.io/) and get an API key
52+
53+
### Build Instructions
54+
55+
1. **Clone and build the package:**
56+
```bash
57+
cd coffee_ws
58+
colcon build --packages-select effector_nodes
59+
source install/setup.bash
60+
```
61+
62+
2. **Install Python dependencies:**
63+
```bash
64+
pip install elevenlabs>=1.0.0 pyaudio>=0.2.11
65+
```
66+
67+
### API Key Setup
68+
69+
Set your ElevenLabs API key using one of these methods:
70+
71+
**Method 1: Environment Variable (Recommended)**
72+
```bash
73+
export ELEVEN_LABS_API_KEY="your_api_key_here"
74+
```
75+
76+
**Method 2: ROS2 Parameter**
77+
```bash
78+
ros2 run effector_nodes tts_node --ros-args -p api_key:="your_api_key_here"
79+
```
80+
81+
**Method 3: Launch File Parameter**
82+
```bash
83+
ros2 launch effector_nodes tts_node.launch.py api_key:="your_api_key_here"
84+
```
85+
86+
## Usage
87+
88+
### Basic Usage
89+
90+
**Start the TTS node:**
91+
```bash
92+
# Using launch file (recommended)
93+
ros2 launch effector_nodes tts_node.launch.py
94+
95+
# Or run directly
96+
ros2 run effector_nodes tts_node
97+
```
98+
99+
**Test TTS service:**
100+
```bash
101+
ros2 service call /system/effector/tts/tts_query coffee_buddy_msgs/srv/TTSQuery "{text: 'Hello, I am Coffee Buddy!'}"
102+
```
103+
104+
**Monitor status:**
105+
```bash
106+
# Monitor general status
107+
ros2 topic echo /system/effector/tts/status
108+
109+
# Monitor audio playback state
110+
ros2 topic echo tts/audio_state
111+
```
112+
113+
### Advanced Configuration
114+
115+
**Custom voice and model:**
116+
```bash
117+
ros2 launch effector_nodes tts_node.launch.py \
118+
voice_id:="pNInz6obpgDQGcFmaJgB" \
119+
model_id:="eleven_multilingual_v2" \
120+
cooldown_duration:=2.0
121+
```
122+
123+
**Different audio format:**
124+
```bash
125+
ros2 launch effector_nodes tts_node.launch.py \
126+
output_format:="pcm_16000"
127+
```
128+
129+
## Configuration
130+
131+
### ROS2 Parameters
132+
133+
| Parameter | Type | Default | Description |
134+
|-----------|------|---------|-------------|
135+
| `voice_id` | string | `KTPVrSVAEUSJRClDzBw7` | ElevenLabs voice ID |
136+
| `model_id` | string | `eleven_multilingual_v2` | ElevenLabs model ID |
137+
| `api_key` | string | `""` | API key (falls back to env var) |
138+
| `cooldown_duration` | double | `1.0` | Cooldown between requests (seconds) |
139+
| `output_format` | string | `pcm_24000` | Audio format (pcm_16000, pcm_24000) |
140+
141+
### Voice Selection
142+
143+
**Popular ElevenLabs Voices:**
144+
- `KTPVrSVAEUSJRClDzBw7` - Default Coffee Buddy voice (energetic, friendly)
145+
- `pNInz6obpgDQGcFmaJgB` - Adam (deep, authoritative)
146+
- `EXAVITQu4vr4xnSDxMaL` - Sarah (warm, conversational)
147+
- `VR6AewLTigWG4xSOukaG` - Nicole (professional, clear)
148+
149+
Find more voices at: [ElevenLabs Voice Library](https://elevenlabs.io/app/voice-library)
150+
151+
### Model Options
152+
153+
- `eleven_multilingual_v2` - Best quality, supports 29+ languages
154+
- `eleven_multilingual_v1` - Good quality, faster processing
155+
- `eleven_monolingual_v1` - English only, fastest processing
156+
157+
## API Reference
158+
159+
### TTSQuery Service
160+
161+
**Service Type:** `coffee_buddy_msgs/srv/TTSQuery`
162+
163+
**Request:**
164+
```
165+
string text # Text to synthesize
166+
```
167+
168+
**Response:**
169+
```
170+
bool success # True if TTS request accepted, False if busy/error
171+
```
172+
173+
**Behavior:**
174+
- Returns `success: true` immediately if request accepted
175+
- Returns `success: false` if already playing audio or in cooldown
176+
- Audio plays asynchronously in background thread
177+
- Publishes state updates to `tts/audio_state` topic
178+
179+
### Status Topic
180+
181+
**Topic:** `/system/effector/tts/status`
182+
**Type:** `std_msgs/String` (JSON format)
183+
184+
**Status JSON Format:**
185+
```json
186+
{
187+
"health": "ok",
188+
"voice_id": "KTPVrSVAEUSJRClDzBw7",
189+
"model_id": "eleven_multilingual_v2",
190+
"is_playing": false,
191+
"in_cooldown": false
192+
}
193+
```
194+
195+
### Audio State Topic
196+
197+
**Topic:** `tts/audio_state`
198+
**Type:** `std_msgs/String`
199+
200+
**Possible Values:**
201+
- `"playing"` - Audio is currently being synthesized/played
202+
- `"cooldown"` - In cooldown period, rejecting new requests
203+
- `"done"` - Ready for new requests
204+
205+
## Integration Examples
206+
207+
### Python Client Example
208+
209+
```python
210+
import rclpy
211+
from rclpy.node import Node
212+
from coffee_buddy_msgs.srv import TTSQuery
213+
from std_msgs.msg import String
214+
215+
class TTSClient(Node):
216+
def __init__(self):
217+
super().__init__('tts_client')
218+
self.client = self.create_client(TTSQuery, '/system/effector/tts/tts_query')
219+
220+
# Subscribe to audio state for synchronization
221+
self.create_subscription(String, 'tts/audio_state', self.audio_state_callback, 10)
222+
223+
def speak(self, text):
224+
request = TTSQuery.Request()
225+
request.text = text
226+
future = self.client.call_async(request)
227+
return future
228+
229+
def audio_state_callback(self, msg):
230+
self.get_logger().info(f'Audio state: {msg.data}')
231+
232+
# Usage
233+
rclpy.init()
234+
client = TTSClient()
235+
future = client.speak("Hello from Coffee Buddy!")
236+
rclpy.spin_until_future_complete(client, future)
237+
```
238+
239+
### C++ Client Example
240+
241+
```cpp
242+
#include <rclcpp/rclcpp.hpp>
243+
#include <coffee_buddy_msgs/srv/tts_query.hpp>
244+
245+
class TTSClient : public rclcpp::Node {
246+
public:
247+
TTSClient() : Node("tts_client") {
248+
client_ = this->create_client<coffee_buddy_msgs::srv::TTSQuery>(
249+
"/system/effector/tts/tts_query");
250+
}
251+
252+
void speak(const std::string& text) {
253+
auto request = std::make_shared<coffee_buddy_msgs::srv::TTSQuery::Request>();
254+
request->text = text;
255+
256+
auto future = client_->async_send_request(request);
257+
// Handle response...
258+
}
259+
260+
private:
261+
rclcpp::Client<coffee_buddy_msgs::srv::TTSQuery>::SharedPtr client_;
262+
};
263+
```
264+
265+
## Troubleshooting
266+
267+
### Common Issues
268+
269+
**1. "ELEVEN_LABS_API_KEY not set" Error**
270+
- Ensure API key is set as environment variable or ROS2 parameter
271+
- Verify API key is valid and has sufficient credits
272+
273+
**2. "PyAudio not found" Error**
274+
```bash
275+
sudo apt-get install python3-pyaudio portaudio19-dev
276+
pip install pyaudio
277+
```
278+
279+
**3. Audio not playing**
280+
- Check audio device permissions
281+
- Verify PulseAudio/ALSA configuration
282+
- Test with: `pactl list short sinks`
283+
284+
**4. High latency**
285+
- Use `pcm_16000` format for faster processing
286+
- Reduce cooldown_duration parameter
287+
- Check network connection to ElevenLabs API
288+
289+
**5. Service call timeouts**
290+
- TTS service responds immediately (non-blocking)
291+
- Monitor `tts/audio_state` topic for actual completion
292+
- Don't wait for audio completion in service call
293+
294+
### Debug Commands
295+
296+
```bash
297+
# Check if node is running
298+
ros2 node list | grep tts_node
299+
300+
# Monitor all TTS topics
301+
ros2 topic list | grep tts
302+
303+
# Check service availability
304+
ros2 service list | grep tts_query
305+
306+
# View node parameters
307+
ros2 param list /tts_node
308+
309+
# Test with minimal text
310+
ros2 service call /system/effector/tts/tts_query coffee_buddy_msgs/srv/TTSQuery "{text: 'test'}"
311+
```
312+
313+
## Development
314+
315+
### Building from Source
316+
317+
```bash
318+
cd coffee_ws
319+
colcon build --packages-select effector_nodes --cmake-args -DCMAKE_BUILD_TYPE=Debug
320+
```
321+
322+
### Running Tests
323+
324+
```bash
325+
colcon test --packages-select effector_nodes
326+
colcon test-result --verbose
327+
```
328+
329+
### Code Style
330+
331+
This package follows ROS2 and PEP8 coding standards:
332+
333+
```bash
334+
# Check code style
335+
ament_flake8 src/
336+
ament_pep257 src/
337+
338+
# Run all tests
339+
colcon test --packages-select effector_nodes
340+
```
341+
342+
## Contributing
343+
344+
1. Fork the repository
345+
2. Create a feature branch
346+
3. Make changes following ROS2 conventions
347+
4. Add tests for new functionality
348+
5. Submit a pull request

0 commit comments

Comments
 (0)