Skip to content

Commit 7672920

Browse files
committed
fixed typos
1 parent 20b0741 commit 7672920

File tree

1 file changed

+15
-11
lines changed

1 file changed

+15
-11
lines changed

examples/Context_summarization_with_realtime_api.ipynb

Lines changed: 15 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -96,7 +96,7 @@
9696
"In practice you’ll often see **≈ 10 ×** more tokens for the *same* sentence in audio versus text.\n",
9797
"\n",
9898
"\n",
99-
"* GPT-4o realtime accepts up to **128k tokens** and as the token size increases instruction adherence can drifts.\n",
99+
"* GPT-4o realtime accepts up to **128k tokens** and as the token size increases, instruction adherence can drift.\n",
100100
"* Every user/assistant turn consumes tokens → the window **only grows**.\n",
101101
"* **Strategy**: Summarise older turns into a single assistant message, keep the last few verbatim turns, and continue.\n",
102102
"\n",
@@ -252,6 +252,7 @@
252252
"metadata": {},
253253
"outputs": [],
254254
"source": [
255+
"# Helper function to encode audio chunks in base64\n",
255256
"b64 = lambda blob: base64.b64encode(blob).decode()\n",
256257
"\n",
257258
"async def queue_to_websocket(pcm_queue: asyncio.Queue[bytes], ws):\n",
@@ -278,26 +279,29 @@
278279
"* Playing incremental audio back to the user \n",
279280
"* Keeping an accurate [`Conversation State`](https://platform.openai.com/docs/api-reference/realtime-server-events/conversation/created) so context trimming works later \n",
280281
"\n",
281-
"| Event type | Typical timing | What you should do with it |\n",
282-
"|------------|----------------|----------------------------|\n",
283-
"| **`session.created`** | Immediately after connection | Verify the handshake; stash the `session_id` if you need it for server logs. |\n",
284-
"| **`conversation.item.created`** (user) | Right after the user stops talking | Place a *placeholder* `Turn` in `state.history`. Transcript may still be `null`. |\n",
285-
"| **`conversation.item.retrieved`** | A few hundred ms later | Fill in any missing user transcript once STT completes. |\n",
286-
"| **`response.audio.delta`** | Streaming chunks while the assistant speaks | Append bytes to a local buffer, play them (low‑latency) as they arrive. |\n",
287-
"| **`response.done`** | After final assistant token | Add assistant text + usage stats, update `state.latest_tokens`. |\n",
288-
"| **`conversation.item.deleted`** | Whenever you prune old turns | Remove superseded items from `conversation.item`. |\n"
282+
"\n",
283+
"| Event type | When it arrives | Why it matters | Typical handler logic |\n",
284+
"|------------|-----------------|---------------|-----------------------|\n",
285+
"| **`session.created`** | Immediately after the WebSocket handshake | Confirms the session is open and provides the `session.id`. | Log the ID for traceability and verify the connection. |\n",
286+
"| **`session.updated`** | After you send a `session.update` call | Acknowledges that the server applied new session settings. | Inspect the echoed settings and update any local cache. |\n",
287+
"| **`conversation.item.created`** (user) | A few ms after the user stops speaking (client VAD fires) | Reserves a timeline slot; transcript may still be **`null`**. | Insert a *placeholder* user turn in `state.history` marked “pending transcript”. |\n",
288+
"| **`conversation.item.retrieved`** | ~100 – 300 ms later, once audio transcription is complete | Supplies the final user transcript (with timing). | Replace the placeholder with the transcript and print it if desired. |\n",
289+
"| **`response.audio.delta`** | Every 20 – 60 ms while the assistant is speaking | Streams PCM‑16 audio chunks (and optional incremental text). | Buffer each chunk and play it; optionally show partial text in the console. |\n",
290+
"| **`response.done`** | After the assistant’s last token | Signals both audio & text are complete; includes usage stats. | Finalize the assistant turn, update `state.latest_tokens`, and log usage. |\n",
291+
"| **`conversation.item.deleted`** | Whenever you prune with `conversation.item.delete` | Confirms a turn was removed, freeing tokens on the server. | Mirror the deletion locally so your context window matches the server’s. |\n",
292+
"\n"
289293
]
290294
},
291295
{
292296
"cell_type": "markdown",
293297
"metadata": {},
294298
"source": [
295299
"### 3.3 Detect When to Summarise\n",
296-
"The Realtime model keeps a **large 128 k‑token window**, but quality can drift long before that as you stuff more context into the model.\n",
300+
"The Realtime model keeps a **large 128 k‑token window**, but quality can drift long before that limit as you stuff more context into the model.\n",
297301
"\n",
298302
"Our goal: **auto‑summarise** once the running window nears a safe threshold (default **2 000 tokens** for the notebook), then prune the superseded turns both locally *and* server‑side.\n",
299303
"\n",
300-
"We monitor latest_tokens returned in response.done. When it exceeds SUMMARY_TRIGGER and we have more than KEEP_LAST_TURNS, we spin up a background summarisation coroutine.\n",
304+
"We monitor latest_tokens returned in `response.done`. When it exceeds SUMMARY_TRIGGER and we have more than KEEP_LAST_TURNS, we spin up a background summarisation coroutine.\n",
301305
"\n",
302306
"We compress everything except the last 2 turns into a single French paragraph, then:\n",
303307
"\n",

0 commit comments

Comments
 (0)