updated to realtime api

minh-hoque · minh-hoque · commit 405b180b47d4 · 2025-09-14T22:48:58.000+02:00
diff --git a/examples/Context_summarization_with_realtime_api.ipynb b/examples/Context_summarization_with_realtime_api.ipynb
@@ -30,7 +30,7 @@
     "\n",
     "\n",
     "*Notes:*\n",
-    "> 1. GPT-4o-Realtime supports a 128k token context window, though in certain use cases, you may notice performance degrade as you stuff more tokens into the context window.\n",
+    "> 1. gpt-realtime supports a 32k token context window, though in certain use cases, you may notice performance degrade as you stuff more tokens into the context window.\n",
     "> 2. Token window = all tokens (words and audio tokens) the model currently keeps in memory for the session.x\n",
     "\n",
     "### One‑liner install (run in a fresh cell)"
@@ -48,7 +48,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 4,
+   "execution_count": 1,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -74,7 +74,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 5,
+   "execution_count": 2,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -96,7 +96,7 @@
     "In practice you’ll often see **≈ 10 ×** more tokens for the *same* sentence in audio versus text.\n",
     "\n",
     "\n",
-    "* GPT-4o realtime accepts up to **128k tokens** and as the token size increases, instruction adherence can drift.\n",
+    "* gpt-realtime accepts up to **32k tokens** and as the token size increases, instruction adherence can drift.\n",
     "* Every user/assistant turn consumes tokens → the window **only grows**.\n",
     "* **Strategy**: Summarise older turns into a single assistant message, keep the last few verbatim turns, and continue.\n",
     "\n",
@@ -128,7 +128,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 6,
+   "execution_count": 3,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -159,7 +159,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 7,
+   "execution_count": 4,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -196,7 +196,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 8,
+   "execution_count": 5,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -248,7 +248,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 9,
+   "execution_count": 6,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -297,11 +297,11 @@
    "metadata": {},
    "source": [
     "### 3.3 Detect When to Summarise\n",
-    "The Realtime model keeps a **large 128 k‑token window**, but quality can drift long before that limit as you stuff more context into the model.\n",
+    "The Realtime model keeps a **large 32 k‑token window**, but quality can drift long before that limit as you stuff more context into the model.\n",
     "\n",
     "Our goal: **auto‑summarise** once the running window nears a safe threshold (default **2 000 tokens** for the notebook), then prune the superseded turns both locally *and* server‑side.\n",
     "\n",
-    "We monitor latest_tokens returned in `response.done`. When it exceeds SUMMARY_TRIGGER and we have more than KEEP_LAST_TURNS, we spin up a background summarisation coroutine.\n",
+    "We monitor latest_tokens returned in `response.done`. When it exceeds SUMMARY_TRIGGER and we have more than KEEP_LAST_TURNS, we spin up a background summarization coroutine.\n",
     "\n",
     "We compress everything except the last 2 turns into a single French paragraph, then:\n",
     "\n",
@@ -314,7 +314,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 10,
+   "execution_count": 7,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -343,7 +343,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 11,
+   "execution_count": 8,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -401,7 +401,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 12,
+   "execution_count": 9,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -451,7 +451,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 13,
+   "execution_count": 10,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -466,14 +466,14 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 14,
+   "execution_count": 11,
    "metadata": {},
    "outputs": [],
    "source": [
     "# --------------------------------------------------------------------------- #\n",
-    "# 🎤 Realtime session                                                          #\n",
+    "# Realtime session                                                          #\n",
     "# --------------------------------------------------------------------------- #\n",
-    "async def realtime_session(model=\"gpt-4o-realtime-preview\", voice=\"shimmer\", enable_playback=True):\n",
+    "async def realtime_session(model=\"gpt-realtime\", voice=\"shimmer\", enable_playback=True):\n",
     "    \"\"\"\n",
     "    Main coroutine: connects to the Realtime endpoint, spawns helper tasks,\n",
     "    and processes incoming events in a big async‑for loop.\n",
@@ -487,7 +487,7 @@
     "    # Open the WebSocket connection to the Realtime API                       #\n",
     "    # ----------------------------------------------------------------------- #\n",
     "    url = f\"wss://api.openai.com/v1/realtime?model={model}\"\n",
-    "    headers = {\"Authorization\": f\"Bearer {openai.api_key}\", \"OpenAI-Beta\": \"realtime=v1\"}\n",
+    "    headers = {\"Authorization\": f\"Bearer {openai.api_key}\"}\n",
     "\n",
     "    async with websockets.connect(url, extra_headers=headers, max_size=1 << 24) as ws:\n",
     "        # ------------------------------------------------------------------- #\n",
@@ -503,6 +503,8 @@
     "        await ws.send(json.dumps({\n",
     "            \"type\": \"session.update\",\n",
     "            \"session\": {\n",
+    "                \"type\": \"realtime\",\n",
+    "                model: \"gpt-realtime\",\n",
     "                \"voice\": voice,\n",
     "                \"modalities\": [\"audio\", \"text\"],\n",
     "                \"input_audio_format\": \"pcm16\",\n",