|
30 | 30 | "\n",
|
31 | 31 | "\n",
|
32 | 32 | "*Notes:*\n",
|
33 |
| - "> 1. GPT-4o-Realtime supports a 128k token context window, though in certain use cases, you may notice performance degrade as you stuff more tokens into the context window.\n", |
| 33 | + "> 1. gpt-realtime supports a 32k token context window, though in certain use cases, you may notice performance degrade as you stuff more tokens into the context window.\n", |
34 | 34 | "> 2. Token window = all tokens (words and audio tokens) the model currently keeps in memory for the session.x\n",
|
35 | 35 | "\n",
|
36 | 36 | "### One‑liner install (run in a fresh cell)"
|
|
48 | 48 | },
|
49 | 49 | {
|
50 | 50 | "cell_type": "code",
|
51 |
| - "execution_count": 4, |
| 51 | + "execution_count": 1, |
52 | 52 | "metadata": {},
|
53 | 53 | "outputs": [],
|
54 | 54 | "source": [
|
|
74 | 74 | },
|
75 | 75 | {
|
76 | 76 | "cell_type": "code",
|
77 |
| - "execution_count": 5, |
| 77 | + "execution_count": 2, |
78 | 78 | "metadata": {},
|
79 | 79 | "outputs": [],
|
80 | 80 | "source": [
|
|
96 | 96 | "In practice you’ll often see **≈ 10 ×** more tokens for the *same* sentence in audio versus text.\n",
|
97 | 97 | "\n",
|
98 | 98 | "\n",
|
99 |
| - "* GPT-4o realtime accepts up to **128k tokens** and as the token size increases, instruction adherence can drift.\n", |
| 99 | + "* gpt-realtime accepts up to **32k tokens** and as the token size increases, instruction adherence can drift.\n", |
100 | 100 | "* Every user/assistant turn consumes tokens → the window **only grows**.\n",
|
101 | 101 | "* **Strategy**: Summarise older turns into a single assistant message, keep the last few verbatim turns, and continue.\n",
|
102 | 102 | "\n",
|
|
128 | 128 | },
|
129 | 129 | {
|
130 | 130 | "cell_type": "code",
|
131 |
| - "execution_count": 6, |
| 131 | + "execution_count": 3, |
132 | 132 | "metadata": {},
|
133 | 133 | "outputs": [],
|
134 | 134 | "source": [
|
|
159 | 159 | },
|
160 | 160 | {
|
161 | 161 | "cell_type": "code",
|
162 |
| - "execution_count": 7, |
| 162 | + "execution_count": 4, |
163 | 163 | "metadata": {},
|
164 | 164 | "outputs": [],
|
165 | 165 | "source": [
|
|
196 | 196 | },
|
197 | 197 | {
|
198 | 198 | "cell_type": "code",
|
199 |
| - "execution_count": 8, |
| 199 | + "execution_count": 5, |
200 | 200 | "metadata": {},
|
201 | 201 | "outputs": [],
|
202 | 202 | "source": [
|
|
248 | 248 | },
|
249 | 249 | {
|
250 | 250 | "cell_type": "code",
|
251 |
| - "execution_count": 9, |
| 251 | + "execution_count": 6, |
252 | 252 | "metadata": {},
|
253 | 253 | "outputs": [],
|
254 | 254 | "source": [
|
|
297 | 297 | "metadata": {},
|
298 | 298 | "source": [
|
299 | 299 | "### 3.3 Detect When to Summarise\n",
|
300 |
| - "The Realtime model keeps a **large 128 k‑token window**, but quality can drift long before that limit as you stuff more context into the model.\n", |
| 300 | + "The Realtime model keeps a **large 32 k‑token window**, but quality can drift long before that limit as you stuff more context into the model.\n", |
301 | 301 | "\n",
|
302 | 302 | "Our goal: **auto‑summarise** once the running window nears a safe threshold (default **2 000 tokens** for the notebook), then prune the superseded turns both locally *and* server‑side.\n",
|
303 | 303 | "\n",
|
304 |
| - "We monitor latest_tokens returned in `response.done`. When it exceeds SUMMARY_TRIGGER and we have more than KEEP_LAST_TURNS, we spin up a background summarisation coroutine.\n", |
| 304 | + "We monitor latest_tokens returned in `response.done`. When it exceeds SUMMARY_TRIGGER and we have more than KEEP_LAST_TURNS, we spin up a background summarization coroutine.\n", |
305 | 305 | "\n",
|
306 | 306 | "We compress everything except the last 2 turns into a single French paragraph, then:\n",
|
307 | 307 | "\n",
|
|
314 | 314 | },
|
315 | 315 | {
|
316 | 316 | "cell_type": "code",
|
317 |
| - "execution_count": 10, |
| 317 | + "execution_count": 7, |
318 | 318 | "metadata": {},
|
319 | 319 | "outputs": [],
|
320 | 320 | "source": [
|
|
343 | 343 | },
|
344 | 344 | {
|
345 | 345 | "cell_type": "code",
|
346 |
| - "execution_count": 11, |
| 346 | + "execution_count": 8, |
347 | 347 | "metadata": {},
|
348 | 348 | "outputs": [],
|
349 | 349 | "source": [
|
|
401 | 401 | },
|
402 | 402 | {
|
403 | 403 | "cell_type": "code",
|
404 |
| - "execution_count": 12, |
| 404 | + "execution_count": 9, |
405 | 405 | "metadata": {},
|
406 | 406 | "outputs": [],
|
407 | 407 | "source": [
|
|
451 | 451 | },
|
452 | 452 | {
|
453 | 453 | "cell_type": "code",
|
454 |
| - "execution_count": 13, |
| 454 | + "execution_count": 10, |
455 | 455 | "metadata": {},
|
456 | 456 | "outputs": [],
|
457 | 457 | "source": [
|
|
466 | 466 | },
|
467 | 467 | {
|
468 | 468 | "cell_type": "code",
|
469 |
| - "execution_count": 14, |
| 469 | + "execution_count": 11, |
470 | 470 | "metadata": {},
|
471 | 471 | "outputs": [],
|
472 | 472 | "source": [
|
473 | 473 | "# --------------------------------------------------------------------------- #\n",
|
474 |
| - "# 🎤 Realtime session #\n", |
| 474 | + "# Realtime session #\n", |
475 | 475 | "# --------------------------------------------------------------------------- #\n",
|
476 |
| - "async def realtime_session(model=\"gpt-4o-realtime-preview\", voice=\"shimmer\", enable_playback=True):\n", |
| 476 | + "async def realtime_session(model=\"gpt-realtime\", voice=\"shimmer\", enable_playback=True):\n", |
477 | 477 | " \"\"\"\n",
|
478 | 478 | " Main coroutine: connects to the Realtime endpoint, spawns helper tasks,\n",
|
479 | 479 | " and processes incoming events in a big async‑for loop.\n",
|
|
487 | 487 | " # Open the WebSocket connection to the Realtime API #\n",
|
488 | 488 | " # ----------------------------------------------------------------------- #\n",
|
489 | 489 | " url = f\"wss://api.openai.com/v1/realtime?model={model}\"\n",
|
490 |
| - " headers = {\"Authorization\": f\"Bearer {openai.api_key}\", \"OpenAI-Beta\": \"realtime=v1\"}\n", |
| 490 | + " headers = {\"Authorization\": f\"Bearer {openai.api_key}\"}\n", |
491 | 491 | "\n",
|
492 | 492 | " async with websockets.connect(url, extra_headers=headers, max_size=1 << 24) as ws:\n",
|
493 | 493 | " # ------------------------------------------------------------------- #\n",
|
|
503 | 503 | " await ws.send(json.dumps({\n",
|
504 | 504 | " \"type\": \"session.update\",\n",
|
505 | 505 | " \"session\": {\n",
|
| 506 | + " \"type\": \"realtime\",\n", |
| 507 | + " model: \"gpt-realtime\",\n", |
506 | 508 | " \"voice\": voice,\n",
|
507 | 509 | " \"modalities\": [\"audio\", \"text\"],\n",
|
508 | 510 | " \"input_audio_format\": \"pcm16\",\n",
|
|
0 commit comments