Skip to content

Commit 48fd242

Browse files
committed
Updated explanations to reflect accurate limits
1 parent ea275d0 commit 48fd242

File tree

1 file changed

+8
-9
lines changed

1 file changed

+8
-9
lines changed

examples/Context_summarization_with_realtime_api.ipynb

Lines changed: 8 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@
1111
"1. **Live microphone streaming** → OpenAI *Realtime* (voice‑to‑voice) endpoint.\n",
1212
"2. **Instant transcripts & speech playback** on every turn.\n",
1313
"3. **Conversation state container** that stores **every** user/assistant message.\n",
14-
"4. **Automatic “context trim”** – when the token window nears 32 k, older turns are compressed into a summary.\n",
14+
"4. **Automatic “context trim”** – when the token window becomes very large (configurable), older turns are compressed into a summary.\n",
1515
"5. **Extensible design** you can adapt to support customer‑support bots, kiosks, or multilingual assistants.\n",
1616
"\n",
1717
"\n",
@@ -40,7 +40,7 @@
4040
"\n",
4141
"\n",
4242
"*Notes:*\n",
43-
"> 1. Why 32 k? OpenAI’s public guidance notes that quality begins to decline well before the full 128 k token limit; 32 k is a conservative threshold observed in practice.\n",
43+
"> 1. GPT-4o-Realtime supports a 128k token context window, though in certain use cases, you may notice performance degrade as you stuff more tokens into the context window.\n",
4444
"> 2. Token window = all tokens (words and audio tokens) the model currently keeps in memory for the session.x\n",
4545
"\n",
4646
"### 🚀 One‑liner install (run in a fresh cell)"
@@ -136,7 +136,7 @@
136136
"### 2.3 Token Context Windows\n",
137137
"\n",
138138
"* GPT‑4o Realtime accepts **up to 128 K tokens** in theory. \n",
139-
"* In practice, answer quality starts to drift around **≈ 32 K tokens**. \n",
139+
"* In practice, answer quality starts to drift as you increase **input token size**. \n",
140140
"* Every user/assistant turn consumes tokens → the window **only grows**.\n",
141141
"* **Strategy**: Summarise older turns into a single assistant message, keep the last few verbatim turns, and continue.\n",
142142
"\n",
@@ -204,11 +204,10 @@
204204
"source": [
205205
"## 3 · Token Utilisation – Text vs Voice\n",
206206
"\n",
207-
"Large‑token windows are precious: every extra token you burn costs latency + money. \n",
208-
"For **audio** the bill climbs much faster than for plain text because amplitude, timing, and other acoustic details must be represented.\n",
207+
"Large‑token windows are precious: every extra token you use costs latency + money. \n",
208+
"For **audio** the input token window increases much faster than for plain text because amplitude, timing, and other acoustic details must be represented.\n",
209209
"\n",
210-
"*Rule of thumb*: **1 word of text ≈ 1 token**, but **1 second of 24‑kHz PCM‑16 ≈ ~150 audio tokens**. \n",
211-
"In practice you’ll often see **≈ 10 ×** more tokens for the *same* sentence spoken aloud than typed.\n",
210+
"In practice you’ll often see **≈ 10 ×** more tokens for the *same* sentence in audio versus text.\n",
212211
"\n",
213212
"### 3.1 Hands‑on comparison 📊\n",
214213
"\n",
@@ -472,8 +471,8 @@
472471
"source": [
473472
"## 5 · Dynamic Context Management & Summarisation\n",
474473
"\n",
475-
"The Realtime model keeps a **gargantuan 128 k‑token window**, but quality drifts long before that. \n",
476-
"Our goal: **auto‑summarise** once the running window nears a safe threshold (default **4 000 tokens**), then prune the superseded turns both locally *and* server‑side.\n",
474+
"The Realtime model keeps a **large 128 k‑token window**, but quality can drift long before that as you stuff more context into the model.\n",
475+
"Our goal: **auto‑summarise** once the running window nears a safe threshold (default **2 000 tokens** for the notebook), then prune the superseded turns both locally *and* server‑side.\n",
477476
"\n",
478477
"### 5.1 Detect When to Summarise\n",
479478
"We monitor latest_tokens returned in response.done. When it exceeds SUMMARY_TRIGGER and we have more than KEEP_LAST_TURNS, we spin up a background summarisation coroutine.\n",

0 commit comments

Comments
 (0)