|
11 | 11 | "1. **Live microphone streaming** → OpenAI *Realtime* (voice‑to‑voice) endpoint.\n",
|
12 | 12 | "2. **Instant transcripts & speech playback** on every turn.\n",
|
13 | 13 | "3. **Conversation state container** that stores **every** user/assistant message.\n",
|
14 |
| - "4. **Automatic “context trim”** – when the token window nears 32 k, older turns are compressed into a summary.\n", |
| 14 | + "4. **Automatic “context trim”** – when the token window becomes very large (configurable), older turns are compressed into a summary.\n", |
15 | 15 | "5. **Extensible design** you can adapt to support customer‑support bots, kiosks, or multilingual assistants.\n",
|
16 | 16 | "\n",
|
17 | 17 | "\n",
|
|
40 | 40 | "\n",
|
41 | 41 | "\n",
|
42 | 42 | "*Notes:*\n",
|
43 |
| - "> 1. Why 32 k? OpenAI’s public guidance notes that quality begins to decline well before the full 128 k token limit; 32 k is a conservative threshold observed in practice.\n", |
| 43 | + "> 1. GPT-4o-Realtime supports a 128k token context window, though in certain use cases, you may notice performance degrade as you stuff more tokens into the context window.\n", |
44 | 44 | "> 2. Token window = all tokens (words and audio tokens) the model currently keeps in memory for the session.x\n",
|
45 | 45 | "\n",
|
46 | 46 | "### 🚀 One‑liner install (run in a fresh cell)"
|
|
136 | 136 | "### 2.3 Token Context Windows\n",
|
137 | 137 | "\n",
|
138 | 138 | "* GPT‑4o Realtime accepts **up to 128 K tokens** in theory. \n",
|
139 |
| - "* In practice, answer quality starts to drift around **≈ 32 K tokens**. \n", |
| 139 | + "* In practice, answer quality starts to drift as you increase **input token size**. \n", |
140 | 140 | "* Every user/assistant turn consumes tokens → the window **only grows**.\n",
|
141 | 141 | "* **Strategy**: Summarise older turns into a single assistant message, keep the last few verbatim turns, and continue.\n",
|
142 | 142 | "\n",
|
|
204 | 204 | "source": [
|
205 | 205 | "## 3 · Token Utilisation – Text vs Voice\n",
|
206 | 206 | "\n",
|
207 |
| - "Large‑token windows are precious: every extra token you burn costs latency + money. \n", |
208 |
| - "For **audio** the bill climbs much faster than for plain text because amplitude, timing, and other acoustic details must be represented.\n", |
| 207 | + "Large‑token windows are precious: every extra token you use costs latency + money. \n", |
| 208 | + "For **audio** the input token window increases much faster than for plain text because amplitude, timing, and other acoustic details must be represented.\n", |
209 | 209 | "\n",
|
210 |
| - "*Rule of thumb*: **1 word of text ≈ 1 token**, but **1 second of 24‑kHz PCM‑16 ≈ ~150 audio tokens**. \n", |
211 |
| - "In practice you’ll often see **≈ 10 ×** more tokens for the *same* sentence spoken aloud than typed.\n", |
| 210 | + "In practice you’ll often see **≈ 10 ×** more tokens for the *same* sentence in audio versus text.\n", |
212 | 211 | "\n",
|
213 | 212 | "### 3.1 Hands‑on comparison 📊\n",
|
214 | 213 | "\n",
|
|
472 | 471 | "source": [
|
473 | 472 | "## 5 · Dynamic Context Management & Summarisation\n",
|
474 | 473 | "\n",
|
475 |
| - "The Realtime model keeps a **gargantuan 128 k‑token window**, but quality drifts long before that. \n", |
476 |
| - "Our goal: **auto‑summarise** once the running window nears a safe threshold (default **4 000 tokens**), then prune the superseded turns both locally *and* server‑side.\n", |
| 474 | + "The Realtime model keeps a **large 128 k‑token window**, but quality can drift long before that as you stuff more context into the model.\n", |
| 475 | + "Our goal: **auto‑summarise** once the running window nears a safe threshold (default **2 000 tokens** for the notebook), then prune the superseded turns both locally *and* server‑side.\n", |
477 | 476 | "\n",
|
478 | 477 | "### 5.1 Detect When to Summarise\n",
|
479 | 478 | "We monitor latest_tokens returned in response.done. When it exceeds SUMMARY_TRIGGER and we have more than KEEP_LAST_TURNS, we spin up a background summarisation coroutine.\n",
|
|
0 commit comments