|
11 | 11 | "1. **Live microphone streaming** → OpenAI *Realtime* (voice‑to‑voice) endpoint.\n", |
12 | 12 | "2. **Instant transcripts & speech playback** on every turn.\n", |
13 | 13 | "3. **Conversation state container** that stores **every** user/assistant message.\n", |
14 | | - "4. **Automatic “context trim”** – when the token window nears 32 k, older turns are compressed into a summary.\n", |
| 14 | + "4. **Automatic “context trim”** – when the token window becomes very large (configurable), older turns are compressed into a summary.\n", |
15 | 15 | "5. **Extensible design** you can adapt to support customer‑support bots, kiosks, or multilingual assistants.\n", |
16 | 16 | "\n", |
17 | 17 | "\n", |
|
40 | 40 | "\n", |
41 | 41 | "\n", |
42 | 42 | "*Notes:*\n", |
43 | | - "> 1. Why 32 k? OpenAI’s public guidance notes that quality begins to decline well before the full 128 k token limit; 32 k is a conservative threshold observed in practice.\n", |
| 43 | + "> 1. GPT-4o-Realtime supports a 128k token context window, though in certain use cases, you may notice performance degrade as you stuff more tokens into the context window.\n", |
44 | 44 | "> 2. Token window = all tokens (words and audio tokens) the model currently keeps in memory for the session.x\n", |
45 | 45 | "\n", |
46 | 46 | "### 🚀 One‑liner install (run in a fresh cell)" |
|
136 | 136 | "### 2.3 Token Context Windows\n", |
137 | 137 | "\n", |
138 | 138 | "* GPT‑4o Realtime accepts **up to 128 K tokens** in theory. \n", |
139 | | - "* In practice, answer quality starts to drift around **≈ 32 K tokens**. \n", |
| 139 | + "* In practice, answer quality starts to drift as you increase **input token size**. \n", |
140 | 140 | "* Every user/assistant turn consumes tokens → the window **only grows**.\n", |
141 | 141 | "* **Strategy**: Summarise older turns into a single assistant message, keep the last few verbatim turns, and continue.\n", |
142 | 142 | "\n", |
|
204 | 204 | "source": [ |
205 | 205 | "## 3 · Token Utilisation – Text vs Voice\n", |
206 | 206 | "\n", |
207 | | - "Large‑token windows are precious: every extra token you burn costs latency + money. \n", |
208 | | - "For **audio** the bill climbs much faster than for plain text because amplitude, timing, and other acoustic details must be represented.\n", |
| 207 | + "Large‑token windows are precious: every extra token you use costs latency + money. \n", |
| 208 | + "For **audio** the input token window increases much faster than for plain text because amplitude, timing, and other acoustic details must be represented.\n", |
209 | 209 | "\n", |
210 | | - "*Rule of thumb*: **1 word of text ≈ 1 token**, but **1 second of 24‑kHz PCM‑16 ≈ ~150 audio tokens**. \n", |
211 | | - "In practice you’ll often see **≈ 10 ×** more tokens for the *same* sentence spoken aloud than typed.\n", |
| 210 | + "In practice you’ll often see **≈ 10 ×** more tokens for the *same* sentence in audio versus text.\n", |
212 | 211 | "\n", |
213 | 212 | "### 3.1 Hands‑on comparison 📊\n", |
214 | 213 | "\n", |
|
472 | 471 | "source": [ |
473 | 472 | "## 5 · Dynamic Context Management & Summarisation\n", |
474 | 473 | "\n", |
475 | | - "The Realtime model keeps a **gargantuan 128 k‑token window**, but quality drifts long before that. \n", |
476 | | - "Our goal: **auto‑summarise** once the running window nears a safe threshold (default **4 000 tokens**), then prune the superseded turns both locally *and* server‑side.\n", |
| 474 | + "The Realtime model keeps a **large 128 k‑token window**, but quality can drift long before that as you stuff more context into the model.\n", |
| 475 | + "Our goal: **auto‑summarise** once the running window nears a safe threshold (default **2 000 tokens** for the notebook), then prune the superseded turns both locally *and* server‑side.\n", |
477 | 476 | "\n", |
478 | 477 | "### 5.1 Detect When to Summarise\n", |
479 | 478 | "We monitor latest_tokens returned in response.done. When it exceeds SUMMARY_TRIGGER and we have more than KEEP_LAST_TURNS, we spin up a background summarisation coroutine.\n", |
|
0 commit comments