Skip to content

Commit 9bc564c

Browse files
authored
Merge branch 'main' into patch-1
2 parents 3c21364 + 5eedb60 commit 9bc564c

6 files changed

+233
-59
lines changed

authors.yaml

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -58,6 +58,16 @@ theophile-oai:
5858
website: "https://www.linkedin.com/in/theophilesautory"
5959
avatar: "https://avatars.githubusercontent.com/u/206768658?v=4"
6060

61+
bfioca-openai:
62+
name: "Brian Fioca"
63+
website: "https://www.linkedin.com/in/brian-fioca/"
64+
avatar: "https://avatars.githubusercontent.com/u/206814564?v=4"
65+
66+
carter-oai:
67+
name: "Carter Mcclellan"
68+
website: "https://www.linkedin.com/in/carter-mcclellan/"
69+
avatar: "https://avatars.githubusercontent.com/u/219906258?v=4"
70+
6171
robert-tinn:
6272
name: "Robert Tinn"
6373
website: "https://www.linkedin.com/in/robert-tinn/"

examples/Context_summarization_with_realtime_api.ipynb

Lines changed: 20 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@
3030
"\n",
3131
"\n",
3232
"*Notes:*\n",
33-
"> 1. GPT-4o-Realtime supports a 128k token context window, though in certain use cases, you may notice performance degrade as you stuff more tokens into the context window.\n",
33+
"> 1. gpt-realtime supports a 32k token context window, though in certain use cases, you may notice performance degrade as you stuff more tokens into the context window.\n",
3434
"> 2. Token window = all tokens (words and audio tokens) the model currently keeps in memory for the session.x\n",
3535
"\n",
3636
"### One‑liner install (run in a fresh cell)"
@@ -48,7 +48,7 @@
4848
},
4949
{
5050
"cell_type": "code",
51-
"execution_count": 4,
51+
"execution_count": 1,
5252
"metadata": {},
5353
"outputs": [],
5454
"source": [
@@ -74,7 +74,7 @@
7474
},
7575
{
7676
"cell_type": "code",
77-
"execution_count": 5,
77+
"execution_count": 2,
7878
"metadata": {},
7979
"outputs": [],
8080
"source": [
@@ -96,7 +96,7 @@
9696
"In practice you’ll often see **≈ 10 ×** more tokens for the *same* sentence in audio versus text.\n",
9797
"\n",
9898
"\n",
99-
"* GPT-4o realtime accepts up to **128k tokens** and as the token size increases, instruction adherence can drift.\n",
99+
"* gpt-realtime accepts up to **32k tokens** and as the token size increases, instruction adherence can drift.\n",
100100
"* Every user/assistant turn consumes tokens → the window **only grows**.\n",
101101
"* **Strategy**: Summarise older turns into a single assistant message, keep the last few verbatim turns, and continue.\n",
102102
"\n",
@@ -128,7 +128,7 @@
128128
},
129129
{
130130
"cell_type": "code",
131-
"execution_count": 6,
131+
"execution_count": 3,
132132
"metadata": {},
133133
"outputs": [],
134134
"source": [
@@ -159,7 +159,7 @@
159159
},
160160
{
161161
"cell_type": "code",
162-
"execution_count": 7,
162+
"execution_count": 4,
163163
"metadata": {},
164164
"outputs": [],
165165
"source": [
@@ -196,7 +196,7 @@
196196
},
197197
{
198198
"cell_type": "code",
199-
"execution_count": 8,
199+
"execution_count": 5,
200200
"metadata": {},
201201
"outputs": [],
202202
"source": [
@@ -248,7 +248,7 @@
248248
},
249249
{
250250
"cell_type": "code",
251-
"execution_count": 9,
251+
"execution_count": 6,
252252
"metadata": {},
253253
"outputs": [],
254254
"source": [
@@ -297,11 +297,11 @@
297297
"metadata": {},
298298
"source": [
299299
"### 3.3 Detect When to Summarise\n",
300-
"The Realtime model keeps a **large 128 k‑token window**, but quality can drift long before that limit as you stuff more context into the model.\n",
300+
"The Realtime model keeps a **large 32 k‑token window**, but quality can drift long before that limit as you stuff more context into the model.\n",
301301
"\n",
302302
"Our goal: **auto‑summarise** once the running window nears a safe threshold (default **2 000 tokens** for the notebook), then prune the superseded turns both locally *and* server‑side.\n",
303303
"\n",
304-
"We monitor latest_tokens returned in `response.done`. When it exceeds SUMMARY_TRIGGER and we have more than KEEP_LAST_TURNS, we spin up a background summarisation coroutine.\n",
304+
"We monitor latest_tokens returned in `response.done`. When it exceeds SUMMARY_TRIGGER and we have more than KEEP_LAST_TURNS, we spin up a background summarization coroutine.\n",
305305
"\n",
306306
"We compress everything except the last 2 turns into a single French paragraph, then:\n",
307307
"\n",
@@ -314,7 +314,7 @@
314314
},
315315
{
316316
"cell_type": "code",
317-
"execution_count": 10,
317+
"execution_count": 7,
318318
"metadata": {},
319319
"outputs": [],
320320
"source": [
@@ -343,7 +343,7 @@
343343
},
344344
{
345345
"cell_type": "code",
346-
"execution_count": 11,
346+
"execution_count": 8,
347347
"metadata": {},
348348
"outputs": [],
349349
"source": [
@@ -401,7 +401,7 @@
401401
},
402402
{
403403
"cell_type": "code",
404-
"execution_count": 12,
404+
"execution_count": 9,
405405
"metadata": {},
406406
"outputs": [],
407407
"source": [
@@ -451,7 +451,7 @@
451451
},
452452
{
453453
"cell_type": "code",
454-
"execution_count": 13,
454+
"execution_count": 10,
455455
"metadata": {},
456456
"outputs": [],
457457
"source": [
@@ -466,14 +466,14 @@
466466
},
467467
{
468468
"cell_type": "code",
469-
"execution_count": 14,
469+
"execution_count": 11,
470470
"metadata": {},
471471
"outputs": [],
472472
"source": [
473473
"# --------------------------------------------------------------------------- #\n",
474-
"# 🎤 Realtime session #\n",
474+
"# Realtime session #\n",
475475
"# --------------------------------------------------------------------------- #\n",
476-
"async def realtime_session(model=\"gpt-4o-realtime-preview\", voice=\"shimmer\", enable_playback=True):\n",
476+
"async def realtime_session(model=\"gpt-realtime\", voice=\"shimmer\", enable_playback=True):\n",
477477
" \"\"\"\n",
478478
" Main coroutine: connects to the Realtime endpoint, spawns helper tasks,\n",
479479
" and processes incoming events in a big async‑for loop.\n",
@@ -487,7 +487,7 @@
487487
" # Open the WebSocket connection to the Realtime API #\n",
488488
" # ----------------------------------------------------------------------- #\n",
489489
" url = f\"wss://api.openai.com/v1/realtime?model={model}\"\n",
490-
" headers = {\"Authorization\": f\"Bearer {openai.api_key}\", \"OpenAI-Beta\": \"realtime=v1\"}\n",
490+
" headers = {\"Authorization\": f\"Bearer {openai.api_key}\"}\n",
491491
"\n",
492492
" async with websockets.connect(url, extra_headers=headers, max_size=1 << 24) as ws:\n",
493493
" # ------------------------------------------------------------------- #\n",
@@ -503,6 +503,8 @@
503503
" await ws.send(json.dumps({\n",
504504
" \"type\": \"session.update\",\n",
505505
" \"session\": {\n",
506+
" \"type\": \"realtime\",\n",
507+
" model: \"gpt-realtime\",\n",
506508
" \"voice\": voice,\n",
507509
" \"modalities\": [\"audio\", \"text\"],\n",
508510
" \"input_audio_format\": \"pcm16\",\n",

examples/Realtime_prompting_guide.ipynb

Lines changed: 2 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99
"\n",
1010
"<img\n",
1111
" src=\"../images/realtime_prompting_guide.png\"\n",
12-
" style=\"width:450px; height:auto;\"\n",
12+
" style=\"width:450px; height:450px;\"\n",
1313
"/>\n",
1414
"\n",
1515
"\n",
@@ -20,39 +20,7 @@
2020
"\n",
2121
"The new gpt-realtime model delivers stronger instruction following, more reliable tool calling, noticeably better voice quality, and an overall smoother feel. These gains make it practical to move from chained approaches to true realtime experiences, cutting latency and producing responses that sound more natural and expressive.\n",
2222
"\n",
23-
"Realtime model benefits from different prompting techniques that wouldn't directly apply to text based models. This prompting guide starts with a suggested prompt skeleton, then walks through each part with practical tips, small patterns you can copy, and examples you can adapt to your use case.\n",
24-
"\n",
25-
"# Table of Contents\n",
26-
"\n",
27-
"- [Realtime Prompting Guide](#realtime-prompting-guide)\n",
28-
"- [General Tips](#general-tips)\n",
29-
"- [Prompt Structure](#prompt-structure)\n",
30-
"- [Role and Objective](#role-and-objective)\n",
31-
"- [Personality and Tone](#personality-and-tone)\n",
32-
" - [Speed Instructions](#speed-instructions)\n",
33-
" - [Language Constraint](#language-constraint)\n",
34-
" - [Reduce Repetition](#reduce-repetition)\n",
35-
"- [Reference Pronunciations](#reference-pronunciations)\n",
36-
" - [Alphanumeric Pronunciations](#alphanumeric-pronunciations)\n",
37-
"- [Instructions](#instructions)\n",
38-
" - [Instruction Following](#instruction-following)\n",
39-
" - [No audio or unclear audio](#no-audio-or-unclear-audio)\n",
40-
"- [Tools](#tools)\n",
41-
" - [Tool Selection](#tool-selection)\n",
42-
" - [Tool Call Preambles](#tool-call-preambles)\n",
43-
" - [Tool Call Preambles + Sample Phrases](#tool-call-preambles-sample-phrases)\n",
44-
" - [Tool Calls without Confirmation](#tool-calls-without-confirmation)\n",
45-
" - [Tool Call Performance](#tool-call-performance)\n",
46-
" - [Tool Level Behavior](#tool-level-behavior)\n",
47-
" - [Rephrase Supervisor Tool (Responder-Thinker Architecture)](#rephrase-supervisor-tool-responder-thinker-architecture)\n",
48-
" - [Common Tools](#common-tools)\n",
49-
"- [Conversation flow](#conversation-flow)\n",
50-
" - [Sample phrases](#sample-phrases)\n",
51-
" - [Conversation flow + Sample Phrases](#conversation-flow-sample-phrases)\n",
52-
" - [Advanced Conversation Flow](#advanced-conversation-flow)\n",
53-
" - [Conversation Flow as State Machine](#conversation-flow-as-state-machine)\n",
54-
" - [Dynamic Conversation Flow](#dynamic-conversation-flow)\n",
55-
"- [Safety & Escalation](#safety-escalation)"
23+
"Realtime model benefits from different prompting techniques that wouldn't directly apply to text based models. This prompting guide starts with a suggested prompt skeleton, then walks through each part with practical tips, small patterns you can copy, and examples you can adapt to your use case."
5624
]
5725
},
5826
{

examples/gpt-5/gpt-5_new_params_and_tools.ipynb

Lines changed: 9 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -968,16 +968,18 @@
968968
"| Rules (lower) | Combine terminals; cannot influence how text is tokenised. |\n",
969969
"| Greedy lexer | Never try to “shape” free text across multiple terminals – you’ll lose control. |\n",
970970
"\n",
971-
"** Correct vs Incorrect Pattern Design\n",
971+
"**Correct vs Incorrect Pattern Design** \n",
972972
"\n",
973973
"✅ **One bounded terminal handles free‑text between anchors** \n",
974-
"start: SENTENCE \n",
975-
"SENTENCE: /[A-Za-z, ]*(the hero|a dragon)[A-Za-z, ]*(fought|saved)[A-Za-z, ]*(a treasure|the kingdom)[A-Za-z, ]*\\./ \n",
976-
"\n",
974+
"```\n",
975+
"start: SENTENCE\n",
976+
"SENTENCE: /[A-Za-z, ]*(the hero|a dragon)[A-Za-z, ]*(fought|saved)[A-Za-z, ]*(a treasure|the kingdom)[A-Za-z, ]*\\./\n",
977+
"```\n",
977978
"❌ **Don’t split free‑text across multiple terminals/rules** \n",
978-
"start: sentence \n",
979-
"sentence: /[A-Za-z, ]+/ subject /[A-Za-z, ]+/ verb /[A-Za-z, ]+/ object /[A-Za-z, ]+/ \n",
980-
"\n",
979+
"```\n",
980+
"start: sentence\n",
981+
"sentence: /[A-Za-z, ]+/ subject /[A-Za-z, ]+/ verb /[A-Za-z, ]+/ object /[A-Za-z, ]+/\n",
982+
"```\n",
981983
"\n",
982984
"### 3.3 Example - SQL Dialect — MS SQL vs PostgreSQL\n",
983985
"\n",

0 commit comments

Comments
 (0)