Skip to content

Commit e839d59

Browse files
committed
feat(realtime): Add Voice Pipeline Orchestration for gpt-realtime
Implements comprehensive voice pipeline orchestration for OpenAI's Realtime API: - Voice Pipeline class for managing TTS/STT orchestration with gpt-realtime - Support for Marin and Cedar realtime voices - Whisper STT integration for speech-to-text - WebRTC support for ultra-low latency (<100ms) - Voice Activity Detection (VAD) capabilities - Audio processing with configurable settings - Metrics monitoring for pipeline performance - Plugin system for easy RealtimeSession integration The Voice Pipeline provides a framework for building voice-enabled applications using OpenAI's Realtime API, handling the complexity of audio streaming, transcription, and synthesis while maintaining low latency. Features: - Seamless integration with RealtimeSession - Configurable audio processing (sample rate, encoding, buffer sizes) - Real-time metrics (STT/TTS latency, processing time) - WebRTC support for browser-based voice applications - Event-driven architecture for audio and speech events
1 parent 32f2381 commit e839d59

File tree

8 files changed

+2075
-2
lines changed

8 files changed

+2075
-2
lines changed

docs/src/content/docs/guides/voice-pipeline.mdx

Lines changed: 412 additions & 0 deletions
Large diffs are not rendered by default.

examples/voice-pipeline/README.md

Lines changed: 141 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,141 @@
1+
# Voice Pipeline Orchestration Example
2+
3+
This example demonstrates the Voice Pipeline Orchestration feature for OpenAI's gpt-realtime model, providing seamless TTS/STT capabilities.
4+
5+
## Features Demonstrated
6+
7+
- **gpt-realtime Integration**: Native support for OpenAI's realtime model
8+
- **Realtime Voices**: Marin and Cedar voice options
9+
- **Whisper STT**: High-quality speech recognition
10+
- **WebRTC Support**: Ultra-low latency (<100ms) voice streaming
11+
- **Voice Activity Detection**: Automatic speech detection
12+
- **Audio Enhancement**: Echo/noise suppression and gain control
13+
- **Metrics Monitoring**: Track pipeline performance
14+
15+
## Prerequisites
16+
17+
1. OpenAI API key with access to:
18+
- gpt-realtime model
19+
- Whisper (speech-to-text)
20+
- Realtime voices (Marin, Cedar)
21+
22+
## Setup
23+
24+
```bash
25+
# Install dependencies
26+
pnpm install
27+
28+
# Set environment variables
29+
export OPENAI_API_KEY="your-api-key"
30+
```
31+
32+
## Running the Example
33+
34+
```bash
35+
# Run the example
36+
pnpm start
37+
38+
# Run in development mode with auto-reload
39+
pnpm dev
40+
```
41+
42+
## What It Does
43+
44+
1. **Initializes Voice Pipeline**: Sets up gpt-realtime with Whisper STT
45+
2. **Demonstrates Voice Switching**: Shows switching between Marin and Cedar voices
46+
3. **Simulates Conversation**: Processes sample voice interactions
47+
4. **Shows Tool Usage**: Weather, calculator, and timer tools
48+
5. **Monitors Metrics**: Displays latency and performance metrics
49+
6. **WebRTC Mode**: Optional ultra-low latency configuration
50+
51+
## Key Components
52+
53+
### gpt-realtime Model
54+
55+
The cutting-edge realtime model providing natural voice interactions with minimal latency.
56+
57+
### Realtime Voices
58+
59+
- **Marin**: Optimized for clarity and professional tone
60+
- **Cedar**: Warm and friendly for conversational interactions
61+
62+
### Whisper STT
63+
64+
OpenAI's state-of-the-art speech recognition for accurate transcription.
65+
66+
### WebRTC Integration
67+
68+
Enables ultra-low latency (<100ms) for real-time conversations.
69+
70+
## Architecture
71+
72+
```
73+
User Audio → Whisper STT → gpt-realtime → Realtime Voice → Audio Output
74+
↑ ↓
75+
└─── Voice Activity ←────────┘
76+
Detection (VAD)
77+
```
78+
79+
## Configuration Options
80+
81+
### Audio Settings
82+
83+
- Sample Rate: 24kHz (optimized for realtime)
84+
- Encoding: PCM16 or Opus (for WebRTC)
85+
- Channels: Mono
86+
87+
### Voice Activity Detection
88+
89+
- Threshold: 0.5 (adjustable sensitivity)
90+
- Max Silence: 2000ms
91+
- Debounce: 300ms
92+
93+
### WebRTC Settings
94+
95+
- ICE Servers: STUN for NAT traversal
96+
- Audio Constraints: Echo/noise suppression
97+
- Target Latency: <100ms
98+
99+
## Customization
100+
101+
Edit `voice-pipeline-example.ts` to:
102+
103+
- Adjust voice settings (Marin/Cedar)
104+
- Modify VAD parameters
105+
- Add custom tools
106+
- Change audio configuration
107+
- Enable/disable WebRTC mode
108+
109+
## Production Considerations
110+
111+
1. **API Keys**: Store securely, never commit to version control
112+
2. **Error Handling**: Implement robust error recovery
113+
3. **Latency**: Use WebRTC for lowest latency requirements
114+
4. **Audio Quality**: Balance quality vs bandwidth based on use case
115+
5. **Rate Limiting**: Monitor API usage and implement appropriate limits
116+
117+
## Troubleshooting
118+
119+
### High Latency
120+
121+
- Enable WebRTC mode for ultra-low latency
122+
- Check network connection quality
123+
- Optimize audio buffer sizes
124+
125+
### Audio Quality Issues
126+
127+
- Adjust VAD threshold for your environment
128+
- Enable noise suppression
129+
- Check microphone quality
130+
131+
### Connection Issues
132+
133+
- Verify API key has necessary permissions
134+
- Check firewall settings for WebRTC
135+
- Ensure stable internet connection
136+
137+
## Related Resources
138+
139+
- [Voice Agents Guide](../../docs/src/content/docs/guides/voice-agents)
140+
- [Realtime API Documentation](https://platform.openai.com/docs/guides/realtime)
141+
- [OpenAI Agents SDK Documentation](../../docs)
Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
{
2+
"name": "voice-pipeline-example",
3+
"version": "0.1.0",
4+
"description": "Voice Pipeline Orchestration example for OpenAI Agents SDK",
5+
"main": "voice-pipeline-example.ts",
6+
"scripts": {
7+
"start": "tsx voice-pipeline-example.ts",
8+
"dev": "tsx watch voice-pipeline-example.ts"
9+
},
10+
"dependencies": {
11+
"@openai/agents": "workspace:*",
12+
"openai": "^4.79.1"
13+
},
14+
"devDependencies": {
15+
"@types/node": "^22.10.5",
16+
"tsx": "^4.19.2",
17+
"typescript": "^5.7.2"
18+
}
19+
}

0 commit comments

Comments
 (0)