|
| 1 | +# Effector Nodes |
| 2 | + |
| 3 | +A ROS2 package providing Text-to-Speech (TTS) effector nodes for the Coffee Buddy robot system using the ElevenLabs API. |
| 4 | + |
| 5 | +## Overview |
| 6 | + |
| 7 | +The `effector_nodes` package serves as the primary TTS effector in the Coffee Buddy robot ecosystem. It provides high-quality, multilingual text-to-speech synthesis through the ElevenLabs API, enabling the robot to communicate naturally with users through voice responses. |
| 8 | + |
| 9 | +## Features |
| 10 | + |
| 11 | +- **🎙️ High-Quality TTS**: Uses ElevenLabs API for professional-grade voice synthesis |
| 12 | +- **🌍 Multilingual Support**: Built-in support for multiple languages via ElevenLabs multilingual models |
| 13 | +- **🔄 Streaming Audio**: Direct audio streaming from API to speakers for low-latency response |
| 14 | +- **⏱️ Smart Cooldown**: Prevents audio overlap with configurable cooldown periods |
| 15 | +- **📊 State Management**: Real-time broadcasting of TTS and audio playback states |
| 16 | +- **🔧 Configurable**: Fully configurable voice, model, and audio settings via ROS2 parameters |
| 17 | +- **🛡️ Thread-Safe**: Concurrent audio processing with proper resource management |
| 18 | +- **📈 Health Monitoring**: Regular status publishing for system monitoring |
| 19 | + |
| 20 | +## Architecture |
| 21 | + |
| 22 | +### ROS2 Interface |
| 23 | + |
| 24 | +**Services:** |
| 25 | +- `/system/effector/tts/tts_query` (coffee_buddy_msgs/TTSQuery) |
| 26 | + - Request: `string text` |
| 27 | + - Response: `bool success` |
| 28 | + |
| 29 | +**Topics Published:** |
| 30 | +- `/system/effector/tts/status` (std_msgs/String) - JSON status with health info |
| 31 | +- `tts/audio_state` (std_msgs/String) - Audio playback state ('playing', 'cooldown', 'done') |
| 32 | + |
| 33 | +### Integration |
| 34 | + |
| 35 | +The TTS node integrates seamlessly with the Coffee Buddy voice pipeline: |
| 36 | + |
| 37 | +``` |
| 38 | +VoiceIntentNode → LanguageModelProcessorNode → TTSNode → Audio Output |
| 39 | +``` |
| 40 | + |
| 41 | +## Installation |
| 42 | + |
| 43 | +### Prerequisites |
| 44 | + |
| 45 | +1. **ROS2 Installation**: Requires ROS2 Humble or newer |
| 46 | +2. **System Dependencies**: |
| 47 | + ```bash |
| 48 | + sudo apt-get install python3-pyaudio portaudio19-dev |
| 49 | + ``` |
| 50 | + |
| 51 | +3. **ElevenLabs API Key**: Sign up at [ElevenLabs](https://elevenlabs.io/) and get an API key |
| 52 | + |
| 53 | +### Build Instructions |
| 54 | + |
| 55 | +1. **Clone and build the package:** |
| 56 | + ```bash |
| 57 | + cd coffee_ws |
| 58 | + colcon build --packages-select effector_nodes |
| 59 | + source install/setup.bash |
| 60 | + ``` |
| 61 | + |
| 62 | +2. **Install Python dependencies:** |
| 63 | + ```bash |
| 64 | + pip install elevenlabs>=1.0.0 pyaudio>=0.2.11 |
| 65 | + ``` |
| 66 | + |
| 67 | +### API Key Setup |
| 68 | + |
| 69 | +Set your ElevenLabs API key using one of these methods: |
| 70 | + |
| 71 | +**Method 1: Environment Variable (Recommended)** |
| 72 | +```bash |
| 73 | +export ELEVEN_LABS_API_KEY="your_api_key_here" |
| 74 | +``` |
| 75 | + |
| 76 | +**Method 2: ROS2 Parameter** |
| 77 | +```bash |
| 78 | +ros2 run effector_nodes tts_node --ros-args -p api_key:="your_api_key_here" |
| 79 | +``` |
| 80 | + |
| 81 | +**Method 3: Launch File Parameter** |
| 82 | +```bash |
| 83 | +ros2 launch effector_nodes tts_node.launch.py api_key:="your_api_key_here" |
| 84 | +``` |
| 85 | + |
| 86 | +## Usage |
| 87 | + |
| 88 | +### Basic Usage |
| 89 | + |
| 90 | +**Start the TTS node:** |
| 91 | +```bash |
| 92 | +# Using launch file (recommended) |
| 93 | +ros2 launch effector_nodes tts_node.launch.py |
| 94 | + |
| 95 | +# Or run directly |
| 96 | +ros2 run effector_nodes tts_node |
| 97 | +``` |
| 98 | + |
| 99 | +**Test TTS service:** |
| 100 | +```bash |
| 101 | +ros2 service call /system/effector/tts/tts_query coffee_buddy_msgs/srv/TTSQuery "{text: 'Hello, I am Coffee Buddy!'}" |
| 102 | +``` |
| 103 | + |
| 104 | +**Monitor status:** |
| 105 | +```bash |
| 106 | +# Monitor general status |
| 107 | +ros2 topic echo /system/effector/tts/status |
| 108 | + |
| 109 | +# Monitor audio playback state |
| 110 | +ros2 topic echo tts/audio_state |
| 111 | +``` |
| 112 | + |
| 113 | +### Advanced Configuration |
| 114 | + |
| 115 | +**Custom voice and model:** |
| 116 | +```bash |
| 117 | +ros2 launch effector_nodes tts_node.launch.py \ |
| 118 | + voice_id:="pNInz6obpgDQGcFmaJgB" \ |
| 119 | + model_id:="eleven_multilingual_v2" \ |
| 120 | + cooldown_duration:=2.0 |
| 121 | +``` |
| 122 | + |
| 123 | +**Different audio format:** |
| 124 | +```bash |
| 125 | +ros2 launch effector_nodes tts_node.launch.py \ |
| 126 | + output_format:="pcm_16000" |
| 127 | +``` |
| 128 | + |
| 129 | +## Configuration |
| 130 | + |
| 131 | +### ROS2 Parameters |
| 132 | + |
| 133 | +| Parameter | Type | Default | Description | |
| 134 | +|-----------|------|---------|-------------| |
| 135 | +| `voice_id` | string | `KTPVrSVAEUSJRClDzBw7` | ElevenLabs voice ID | |
| 136 | +| `model_id` | string | `eleven_multilingual_v2` | ElevenLabs model ID | |
| 137 | +| `api_key` | string | `""` | API key (falls back to env var) | |
| 138 | +| `cooldown_duration` | double | `1.0` | Cooldown between requests (seconds) | |
| 139 | +| `output_format` | string | `pcm_24000` | Audio format (pcm_16000, pcm_24000) | |
| 140 | + |
| 141 | +### Voice Selection |
| 142 | + |
| 143 | +**Popular ElevenLabs Voices:** |
| 144 | +- `KTPVrSVAEUSJRClDzBw7` - Default Coffee Buddy voice (energetic, friendly) |
| 145 | +- `pNInz6obpgDQGcFmaJgB` - Adam (deep, authoritative) |
| 146 | +- `EXAVITQu4vr4xnSDxMaL` - Sarah (warm, conversational) |
| 147 | +- `VR6AewLTigWG4xSOukaG` - Nicole (professional, clear) |
| 148 | + |
| 149 | +Find more voices at: [ElevenLabs Voice Library](https://elevenlabs.io/app/voice-library) |
| 150 | + |
| 151 | +### Model Options |
| 152 | + |
| 153 | +- `eleven_multilingual_v2` - Best quality, supports 29+ languages |
| 154 | +- `eleven_multilingual_v1` - Good quality, faster processing |
| 155 | +- `eleven_monolingual_v1` - English only, fastest processing |
| 156 | + |
| 157 | +## API Reference |
| 158 | + |
| 159 | +### TTSQuery Service |
| 160 | + |
| 161 | +**Service Type:** `coffee_buddy_msgs/srv/TTSQuery` |
| 162 | + |
| 163 | +**Request:** |
| 164 | +``` |
| 165 | +string text # Text to synthesize |
| 166 | +``` |
| 167 | + |
| 168 | +**Response:** |
| 169 | +``` |
| 170 | +bool success # True if TTS request accepted, False if busy/error |
| 171 | +``` |
| 172 | + |
| 173 | +**Behavior:** |
| 174 | +- Returns `success: true` immediately if request accepted |
| 175 | +- Returns `success: false` if already playing audio or in cooldown |
| 176 | +- Audio plays asynchronously in background thread |
| 177 | +- Publishes state updates to `tts/audio_state` topic |
| 178 | + |
| 179 | +### Status Topic |
| 180 | + |
| 181 | +**Topic:** `/system/effector/tts/status` |
| 182 | +**Type:** `std_msgs/String` (JSON format) |
| 183 | + |
| 184 | +**Status JSON Format:** |
| 185 | +```json |
| 186 | +{ |
| 187 | + "health": "ok", |
| 188 | + "voice_id": "KTPVrSVAEUSJRClDzBw7", |
| 189 | + "model_id": "eleven_multilingual_v2", |
| 190 | + "is_playing": false, |
| 191 | + "in_cooldown": false |
| 192 | +} |
| 193 | +``` |
| 194 | + |
| 195 | +### Audio State Topic |
| 196 | + |
| 197 | +**Topic:** `tts/audio_state` |
| 198 | +**Type:** `std_msgs/String` |
| 199 | + |
| 200 | +**Possible Values:** |
| 201 | +- `"playing"` - Audio is currently being synthesized/played |
| 202 | +- `"cooldown"` - In cooldown period, rejecting new requests |
| 203 | +- `"done"` - Ready for new requests |
| 204 | + |
| 205 | +## Integration Examples |
| 206 | + |
| 207 | +### Python Client Example |
| 208 | + |
| 209 | +```python |
| 210 | +import rclpy |
| 211 | +from rclpy.node import Node |
| 212 | +from coffee_buddy_msgs.srv import TTSQuery |
| 213 | +from std_msgs.msg import String |
| 214 | + |
| 215 | +class TTSClient(Node): |
| 216 | + def __init__(self): |
| 217 | + super().__init__('tts_client') |
| 218 | + self.client = self.create_client(TTSQuery, '/system/effector/tts/tts_query') |
| 219 | + |
| 220 | + # Subscribe to audio state for synchronization |
| 221 | + self.create_subscription(String, 'tts/audio_state', self.audio_state_callback, 10) |
| 222 | + |
| 223 | + def speak(self, text): |
| 224 | + request = TTSQuery.Request() |
| 225 | + request.text = text |
| 226 | + future = self.client.call_async(request) |
| 227 | + return future |
| 228 | + |
| 229 | + def audio_state_callback(self, msg): |
| 230 | + self.get_logger().info(f'Audio state: {msg.data}') |
| 231 | + |
| 232 | +# Usage |
| 233 | +rclpy.init() |
| 234 | +client = TTSClient() |
| 235 | +future = client.speak("Hello from Coffee Buddy!") |
| 236 | +rclpy.spin_until_future_complete(client, future) |
| 237 | +``` |
| 238 | + |
| 239 | +### C++ Client Example |
| 240 | + |
| 241 | +```cpp |
| 242 | +#include <rclcpp/rclcpp.hpp> |
| 243 | +#include <coffee_buddy_msgs/srv/tts_query.hpp> |
| 244 | + |
| 245 | +class TTSClient : public rclcpp::Node { |
| 246 | +public: |
| 247 | + TTSClient() : Node("tts_client") { |
| 248 | + client_ = this->create_client<coffee_buddy_msgs::srv::TTSQuery>( |
| 249 | + "/system/effector/tts/tts_query"); |
| 250 | + } |
| 251 | + |
| 252 | + void speak(const std::string& text) { |
| 253 | + auto request = std::make_shared<coffee_buddy_msgs::srv::TTSQuery::Request>(); |
| 254 | + request->text = text; |
| 255 | + |
| 256 | + auto future = client_->async_send_request(request); |
| 257 | + // Handle response... |
| 258 | + } |
| 259 | + |
| 260 | +private: |
| 261 | + rclcpp::Client<coffee_buddy_msgs::srv::TTSQuery>::SharedPtr client_; |
| 262 | +}; |
| 263 | +``` |
| 264 | +
|
| 265 | +## Troubleshooting |
| 266 | +
|
| 267 | +### Common Issues |
| 268 | +
|
| 269 | +**1. "ELEVEN_LABS_API_KEY not set" Error** |
| 270 | +- Ensure API key is set as environment variable or ROS2 parameter |
| 271 | +- Verify API key is valid and has sufficient credits |
| 272 | +
|
| 273 | +**2. "PyAudio not found" Error** |
| 274 | +```bash |
| 275 | +sudo apt-get install python3-pyaudio portaudio19-dev |
| 276 | +pip install pyaudio |
| 277 | +``` |
| 278 | + |
| 279 | +**3. Audio not playing** |
| 280 | +- Check audio device permissions |
| 281 | +- Verify PulseAudio/ALSA configuration |
| 282 | +- Test with: `pactl list short sinks` |
| 283 | + |
| 284 | +**4. High latency** |
| 285 | +- Use `pcm_16000` format for faster processing |
| 286 | +- Reduce cooldown_duration parameter |
| 287 | +- Check network connection to ElevenLabs API |
| 288 | + |
| 289 | +**5. Service call timeouts** |
| 290 | +- TTS service responds immediately (non-blocking) |
| 291 | +- Monitor `tts/audio_state` topic for actual completion |
| 292 | +- Don't wait for audio completion in service call |
| 293 | + |
| 294 | +### Debug Commands |
| 295 | + |
| 296 | +```bash |
| 297 | +# Check if node is running |
| 298 | +ros2 node list | grep tts_node |
| 299 | + |
| 300 | +# Monitor all TTS topics |
| 301 | +ros2 topic list | grep tts |
| 302 | + |
| 303 | +# Check service availability |
| 304 | +ros2 service list | grep tts_query |
| 305 | + |
| 306 | +# View node parameters |
| 307 | +ros2 param list /tts_node |
| 308 | + |
| 309 | +# Test with minimal text |
| 310 | +ros2 service call /system/effector/tts/tts_query coffee_buddy_msgs/srv/TTSQuery "{text: 'test'}" |
| 311 | +``` |
| 312 | + |
| 313 | +## Development |
| 314 | + |
| 315 | +### Building from Source |
| 316 | + |
| 317 | +```bash |
| 318 | +cd coffee_ws |
| 319 | +colcon build --packages-select effector_nodes --cmake-args -DCMAKE_BUILD_TYPE=Debug |
| 320 | +``` |
| 321 | + |
| 322 | +### Running Tests |
| 323 | + |
| 324 | +```bash |
| 325 | +colcon test --packages-select effector_nodes |
| 326 | +colcon test-result --verbose |
| 327 | +``` |
| 328 | + |
| 329 | +### Code Style |
| 330 | + |
| 331 | +This package follows ROS2 and PEP8 coding standards: |
| 332 | + |
| 333 | +```bash |
| 334 | +# Check code style |
| 335 | +ament_flake8 src/ |
| 336 | +ament_pep257 src/ |
| 337 | + |
| 338 | +# Run all tests |
| 339 | +colcon test --packages-select effector_nodes |
| 340 | +``` |
| 341 | + |
| 342 | +## Contributing |
| 343 | + |
| 344 | +1. Fork the repository |
| 345 | +2. Create a feature branch |
| 346 | +3. Make changes following ROS2 conventions |
| 347 | +4. Add tests for new functionality |
| 348 | +5. Submit a pull request |
0 commit comments