Skip to content

Conversation

@arikbd123
Copy link

Implemented a new direct speech-to-speech voice agent using OpenAI's Realtime API (gpt-4o-realtime-preview) with clean separation of concerns.

Key features:

  • Direct speech-to-speech (no chained STT → GPT → TTS)
  • WebSocket-based architecture with Twilio MediaStreams
  • Native g711_ulaw (μ-law) audio format support
  • Intelligent interruption/barge-in handling
  • Automatic reconnection and error handling
  • Clean modular structure in /src directory

Architecture:

  • src/openai/realtimeClient.js - OpenAI Realtime WebSocket client
  • src/core/callSession.js - Session orchestrator
  • src/twilio/twilioServer.js - Twilio MediaStream server
  • src/index.js - Main entry point

Added:

  • New npm scripts: 'realtime' and 'realtime:dev'
  • Comprehensive README.realtime.md with setup guide
  • .env.realtime.example for configuration
  • ws package dependency for WebSocket client

The old chained architecture (app.js) remains intact for backward compatibility.

Run with: npm run realtime

Contributing to Twilio

All third-party contributors acknowledge that any contributions they provide will be made under the same open-source license that the open-source project is provided under.

  • I acknowledge that all my contributions will be made under the project's license.

Implemented a new direct speech-to-speech voice agent using OpenAI's
Realtime API (gpt-4o-realtime-preview) with clean separation of concerns.

Key features:
- Direct speech-to-speech (no chained STT → GPT → TTS)
- WebSocket-based architecture with Twilio MediaStreams
- Native g711_ulaw (μ-law) audio format support
- Intelligent interruption/barge-in handling
- Automatic reconnection and error handling
- Clean modular structure in /src directory

Architecture:
- src/openai/realtimeClient.js - OpenAI Realtime WebSocket client
- src/core/callSession.js - Session orchestrator
- src/twilio/twilioServer.js - Twilio MediaStream server
- src/index.js - Main entry point

Added:
- New npm scripts: 'realtime' and 'realtime:dev'
- Comprehensive README.realtime.md with setup guide
- .env.realtime.example for configuration
- ws package dependency for WebSocket client

The old chained architecture (app.js) remains intact for backward compatibility.

Run with: npm run realtime
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants