Skip to content

Commit 4a06f14

Browse files
committed
twilio
1 parent 4d1ce04 commit 4a06f14

File tree

6 files changed

+298
-11
lines changed

6 files changed

+298
-11
lines changed

docs/scripts/generate_ref_files.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,7 @@ def md_target(py_path: Path) -> Path:
3131
rel = py_path.relative_to(SRC_ROOT).with_suffix(".md")
3232
return DOCS_ROOT / rel
3333

34+
3435
def pretty_title(last_segment: str) -> str:
3536
"""
3637
Convert a module/file segment like 'tool_context' to 'Tool Context'.
@@ -39,6 +40,7 @@ def pretty_title(last_segment: str) -> str:
3940
cleaned = last_segment.replace("_", " ").replace("-", " ")
4041
return capwords(cleaned)
4142

43+
4244
# ---- Main ------------------------------------------------------------
4345

4446

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
# OpenAI API Key - Required for Realtime API access
2+
OPENAI_API_KEY=your_openai_api_key_here
3+
4+
# Port for the server (optional, defaults to 8000)
5+
PORT=8000
6+
7+
# Your ngrok or public URL (update the server.py TwiML response with this)
8+
# Example: https://abc123.ngrok.io
9+
PUBLIC_URL=your_public_url_here

examples/realtime/twilio/README.md

Lines changed: 91 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,91 @@
1+
# Realtime Twilio Integration
2+
3+
This example demonstrates how to connect the OpenAI Realtime API to a phone call using Twilio's Media Streams. The server handles incoming phone calls and streams audio between Twilio and the OpenAI Realtime API, enabling real-time voice conversations with an AI agent over the phone.
4+
5+
## Prerequisites
6+
7+
- Python 3.8+
8+
- OpenAI API key with Realtime API access
9+
- Twilio account with a phone number
10+
- A tunneling service like ngrok to expose your local server
11+
12+
## Setup
13+
14+
1. **Install dependencies:**
15+
```bash
16+
cd examples/realtime/twilio
17+
pip install -r requirements.txt
18+
```
19+
20+
2. **Set up environment variables:**
21+
```bash
22+
cp .env.example .env
23+
# Edit .env and add your OpenAI API key
24+
```
25+
26+
3. **Start the server:**
27+
```bash
28+
python server.py
29+
```
30+
The server will start on port 8000 by default.
31+
32+
4. **Expose your server publicly:**
33+
```bash
34+
ngrok http 8000
35+
```
36+
Note the public URL (e.g., `https://abc123.ngrok.io`)
37+
38+
5. **Update the TwiML response:**
39+
Edit `server.py` and replace `your-ngrok-url.ngrok.io` in the `incoming_call()` function with your actual ngrok URL.
40+
41+
6. **Configure your Twilio phone number:**
42+
- Log into your Twilio Console
43+
- Go to Phone Numbers → Manage → Active numbers
44+
- Click on your phone number
45+
- Set the webhook URL for incoming calls to: `https://your-ngrok-url.ngrok.io/incoming-call`
46+
- Set the HTTP method to POST (or GET, both are supported)
47+
48+
## Usage
49+
50+
1. Call your Twilio phone number
51+
2. You'll hear: "Hello! You're now connected to an AI assistant. You can start talking!"
52+
3. Start speaking - the AI will respond in real-time
53+
4. The assistant has access to tools like weather information and current time
54+
55+
## How It Works
56+
57+
1. **Incoming Call**: When someone calls your Twilio number, Twilio makes a request to `/incoming-call`
58+
2. **TwiML Response**: The server returns TwiML that:
59+
- Plays a greeting message
60+
- Connects the call to a WebSocket stream at `/media-stream`
61+
3. **WebSocket Connection**: Twilio establishes a WebSocket connection for bidirectional audio streaming
62+
4. **Audio Processing**:
63+
- Audio from the caller is base64 decoded and sent to OpenAI Realtime API
64+
- Audio responses from OpenAI are base64 encoded and sent back to Twilio
65+
- Twilio plays the audio to the caller
66+
67+
## Configuration
68+
69+
- **Port**: Set `PORT` environment variable (default: 8000)
70+
- **OpenAI API Key**: Set `OPENAI_API_KEY` environment variable
71+
- **Agent Instructions**: Modify the `RealtimeAgent` configuration in `server.py`
72+
- **Tools**: Add or modify function tools in `server.py`
73+
74+
## Troubleshooting
75+
76+
- **WebSocket connection issues**: Ensure your ngrok URL is correct and publicly accessible
77+
- **Audio quality**: Twilio streams audio in mulaw format at 8kHz, which may affect quality
78+
- **Latency**: Network latency between Twilio, your server, and OpenAI affects response time
79+
- **Logs**: Check the console output for detailed connection and error logs
80+
81+
## Architecture
82+
83+
```
84+
Phone Call → Twilio → WebSocket → Your Server → OpenAI Realtime API
85+
86+
RealtimeAgent with Tools
87+
88+
Audio Response → Twilio → Phone Call
89+
```
90+
91+
The server acts as a bridge between Twilio's Media Streams and OpenAI's Realtime API, handling the protocol differences and audio format conversions.
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
fastapi
2+
uvicorn[standard]
3+
websockets
4+
python-dotenv

examples/realtime/twilio/server.py

Lines changed: 179 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,179 @@
1+
import asyncio
2+
import base64
3+
import json
4+
import logging
5+
import os
6+
from typing import Any
7+
8+
from fastapi import FastAPI, WebSocket, WebSocketDisconnect
9+
from fastapi.responses import PlainTextResponse
10+
11+
from agents import function_tool
12+
from agents.realtime import RealtimeAgent, RealtimeRunner, RealtimeSession
13+
14+
logging.basicConfig(level=logging.INFO)
15+
logger = logging.getLogger(__name__)
16+
17+
18+
@function_tool
19+
def get_weather(city: str) -> str:
20+
"""Get the weather in a city."""
21+
return f"The weather in {city} is sunny."
22+
23+
24+
@function_tool
25+
def get_current_time() -> str:
26+
"""Get the current time."""
27+
from datetime import datetime
28+
29+
return f"The current time is {datetime.now().strftime('%H:%M:%S')}"
30+
31+
32+
agent = RealtimeAgent(
33+
name="Twilio Assistant",
34+
instructions="You are a helpful assistant that starts every conversation with a creative greeting. Keep responses concise and friendly since this is a phone conversation.",
35+
tools=[get_weather, get_current_time],
36+
)
37+
38+
39+
class TwilioWebSocketManager:
40+
def __init__(self):
41+
self.active_sessions: dict[str, RealtimeSession] = {}
42+
self.session_contexts: dict[str, Any] = {}
43+
self.websockets: dict[str, WebSocket] = {}
44+
45+
async def connect(self, websocket: WebSocket, call_sid: str):
46+
await websocket.accept()
47+
self.websockets[call_sid] = websocket
48+
logger.info(f"WebSocket connection accepted for call {call_sid}")
49+
50+
runner = RealtimeRunner(agent)
51+
session_context = await runner.run()
52+
session = await session_context.__aenter__()
53+
self.active_sessions[call_sid] = session
54+
self.session_contexts[call_sid] = session_context
55+
56+
# Start event processing task
57+
asyncio.create_task(self._process_events(call_sid))
58+
59+
async def disconnect(self, call_sid: str):
60+
logger.info(f"Disconnecting call {call_sid}")
61+
if call_sid in self.session_contexts:
62+
await self.session_contexts[call_sid].__aexit__(None, None, None)
63+
del self.session_contexts[call_sid]
64+
if call_sid in self.active_sessions:
65+
del self.active_sessions[call_sid]
66+
if call_sid in self.websockets:
67+
del self.websockets[call_sid]
68+
69+
async def handle_twilio_message(self, call_sid: str, message: dict[str, Any]):
70+
"""Handle incoming Twilio WebSocket messages"""
71+
event = message.get("event")
72+
73+
if event == "connected":
74+
logger.info(f"Twilio media stream connected for call {call_sid}")
75+
elif event == "start":
76+
logger.info(f"Media stream started for call {call_sid}")
77+
elif event == "media":
78+
# Handle audio data from Twilio
79+
payload = message.get("media", {})
80+
audio_data = payload.get("payload", "")
81+
if audio_data and call_sid in self.active_sessions:
82+
# Decode base64 audio and send to OpenAI
83+
try:
84+
audio_bytes = base64.b64decode(audio_data)
85+
await self.active_sessions[call_sid].send_audio(audio_bytes)
86+
except Exception as e:
87+
logger.error(f"Error processing audio for call {call_sid}: {e}")
88+
elif event == "stop":
89+
logger.info(f"Media stream stopped for call {call_sid}")
90+
91+
async def send_audio_to_twilio(self, call_sid: str, audio_bytes: bytes):
92+
"""Send audio back to Twilio"""
93+
if call_sid in self.websockets:
94+
websocket = self.websockets[call_sid]
95+
# Encode audio as base64 for Twilio
96+
audio_base64 = base64.b64encode(audio_bytes).decode("utf-8")
97+
98+
message = {"event": "media", "streamSid": call_sid, "media": {"payload": audio_base64}}
99+
100+
try:
101+
await websocket.send_text(json.dumps(message))
102+
except Exception as e:
103+
logger.error(f"Error sending audio to Twilio for call {call_sid}: {e}")
104+
105+
async def _process_events(self, call_sid: str):
106+
"""Process events from OpenAI Realtime API and send audio to Twilio"""
107+
try:
108+
session = self.active_sessions[call_sid]
109+
110+
async for event in session:
111+
if event.type == "audio":
112+
# Send audio back to Twilio
113+
await self.send_audio_to_twilio(call_sid, event.audio.data)
114+
elif event.type == "error":
115+
logger.error(f"OpenAI Realtime API error for call {call_sid}: {event}")
116+
117+
except Exception as e:
118+
logger.error(f"Error processing events for call {call_sid}: {e}")
119+
120+
121+
manager = TwilioWebSocketManager()
122+
123+
app = FastAPI()
124+
125+
126+
@app.get("/")
127+
async def root():
128+
return {"message": "Twilio Media Stream Server is running!"}
129+
130+
131+
@app.post("/incoming-call")
132+
@app.get("/incoming-call")
133+
async def incoming_call():
134+
"""Handle incoming Twilio phone calls"""
135+
twiml_response = """<?xml version="1.0" encoding="UTF-8"?>
136+
<Response>
137+
<Say>Hello! You're now connected to an AI assistant. You can start talking!</Say>
138+
<Connect>
139+
<Stream url="wss://your-ngrok-url.ngrok.io/media-stream" />
140+
</Connect>
141+
</Response>"""
142+
return PlainTextResponse(content=twiml_response, media_type="text/xml")
143+
144+
145+
@app.websocket("/media-stream")
146+
async def media_stream_endpoint(websocket: WebSocket):
147+
"""WebSocket endpoint for Twilio Media Streams"""
148+
call_sid = None
149+
150+
try:
151+
await websocket.accept()
152+
logger.info("WebSocket connection accepted")
153+
154+
while True:
155+
data = await websocket.receive_text()
156+
message = json.loads(data)
157+
158+
# Extract call SID from the first message
159+
if call_sid is None:
160+
call_sid = message.get("streamSid", "unknown")
161+
await manager.connect(websocket, call_sid)
162+
163+
await manager.handle_twilio_message(call_sid, message)
164+
165+
except WebSocketDisconnect:
166+
logger.info("WebSocket disconnected")
167+
if call_sid:
168+
await manager.disconnect(call_sid)
169+
except Exception as e:
170+
logger.error(f"WebSocket error: {e}")
171+
if call_sid:
172+
await manager.disconnect(call_sid)
173+
174+
175+
if __name__ == "__main__":
176+
import uvicorn
177+
178+
port = int(os.getenv("PORT", 8000))
179+
uvicorn.run(app, host="0.0.0.0", port=port)

src/agents/models/chatcmpl_stream_handler.py

Lines changed: 13 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -288,10 +288,11 @@ async def handle_stream(
288288
function_call = state.function_calls[tc_delta.index]
289289

290290
# Start streaming as soon as we have function name and call_id
291-
if (not state.function_call_streaming[tc_delta.index] and
292-
function_call.name and
293-
function_call.call_id):
294-
291+
if (
292+
not state.function_call_streaming[tc_delta.index]
293+
and function_call.name
294+
and function_call.call_id
295+
):
295296
# Calculate the output index for this function call
296297
function_call_starting_index = 0
297298
if state.reasoning_content_index_and_output:
@@ -308,9 +309,9 @@ async def handle_stream(
308309

309310
# Mark this function call as streaming and store its output index
310311
state.function_call_streaming[tc_delta.index] = True
311-
state.function_call_output_idx[
312-
tc_delta.index
313-
] = function_call_starting_index
312+
state.function_call_output_idx[tc_delta.index] = (
313+
function_call_starting_index
314+
)
314315

315316
# Send initial function call added event
316317
yield ResponseOutputItemAddedEvent(
@@ -327,10 +328,11 @@ async def handle_stream(
327328
)
328329

329330
# Stream arguments if we've started streaming this function call
330-
if (state.function_call_streaming.get(tc_delta.index, False) and
331-
tc_function and
332-
tc_function.arguments):
333-
331+
if (
332+
state.function_call_streaming.get(tc_delta.index, False)
333+
and tc_function
334+
and tc_function.arguments
335+
):
334336
output_index = state.function_call_output_idx[tc_delta.index]
335337
yield ResponseFunctionCallArgumentsDeltaEvent(
336338
delta=tc_function.arguments,

0 commit comments

Comments
 (0)