Skip to content

Commit 1ab8c6b

Browse files
authored
Merge branch 'main' into main
2 parents a1fef53 + 5eedb60 commit 1ab8c6b

12 files changed

+1917
-102
lines changed

authors.yaml

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,11 @@ rajpathak-openai:
3838
website: "https://www.linkedin.com/in/rajpathakopenai/"
3939
avatar: "https://avatars.githubusercontent.com/u/208723614?s=400&u=c852eed3be082f7fbd402b5a45e9b89a0bfed1b8&v=4"
4040

41+
emreokcular:
42+
name: "Emre Okcular"
43+
website: "https://www.linkedin.com/in/emreokcular/"
44+
avatar: "https://avatars.githubusercontent.com/u/26163154?v=4"
45+
4146
chelseahu-openai:
4247
name: "Chelsea Hu"
4348
website: "https://www.linkedin.com/in/chelsea-tsaiszuhu/"
@@ -53,6 +58,16 @@ theophile-oai:
5358
website: "https://www.linkedin.com/in/theophilesautory"
5459
avatar: "https://avatars.githubusercontent.com/u/206768658?v=4"
5560

61+
bfioca-openai:
62+
name: "Brian Fioca"
63+
website: "https://www.linkedin.com/in/brian-fioca/"
64+
avatar: "https://avatars.githubusercontent.com/u/206814564?v=4"
65+
66+
carter-oai:
67+
name: "Carter Mcclellan"
68+
website: "https://www.linkedin.com/in/carter-mcclellan/"
69+
avatar: "https://avatars.githubusercontent.com/u/219906258?v=4"
70+
5671
robert-tinn:
5772
name: "Robert Tinn"
5873
website: "https://www.linkedin.com/in/robert-tinn/"

examples/Context_summarization_with_realtime_api.ipynb

Lines changed: 20 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@
3030
"\n",
3131
"\n",
3232
"*Notes:*\n",
33-
"> 1. GPT-4o-Realtime supports a 128k token context window, though in certain use cases, you may notice performance degrade as you stuff more tokens into the context window.\n",
33+
"> 1. gpt-realtime supports a 32k token context window, though in certain use cases, you may notice performance degrade as you stuff more tokens into the context window.\n",
3434
"> 2. Token window = all tokens (words and audio tokens) the model currently keeps in memory for the session.x\n",
3535
"\n",
3636
"### One‑liner install (run in a fresh cell)"
@@ -48,7 +48,7 @@
4848
},
4949
{
5050
"cell_type": "code",
51-
"execution_count": 4,
51+
"execution_count": 1,
5252
"metadata": {},
5353
"outputs": [],
5454
"source": [
@@ -74,7 +74,7 @@
7474
},
7575
{
7676
"cell_type": "code",
77-
"execution_count": 5,
77+
"execution_count": 2,
7878
"metadata": {},
7979
"outputs": [],
8080
"source": [
@@ -96,7 +96,7 @@
9696
"In practice you’ll often see **≈ 10 ×** more tokens for the *same* sentence in audio versus text.\n",
9797
"\n",
9898
"\n",
99-
"* GPT-4o realtime accepts up to **128k tokens** and as the token size increases, instruction adherence can drift.\n",
99+
"* gpt-realtime accepts up to **32k tokens** and as the token size increases, instruction adherence can drift.\n",
100100
"* Every user/assistant turn consumes tokens → the window **only grows**.\n",
101101
"* **Strategy**: Summarise older turns into a single assistant message, keep the last few verbatim turns, and continue.\n",
102102
"\n",
@@ -128,7 +128,7 @@
128128
},
129129
{
130130
"cell_type": "code",
131-
"execution_count": 6,
131+
"execution_count": 3,
132132
"metadata": {},
133133
"outputs": [],
134134
"source": [
@@ -159,7 +159,7 @@
159159
},
160160
{
161161
"cell_type": "code",
162-
"execution_count": 7,
162+
"execution_count": 4,
163163
"metadata": {},
164164
"outputs": [],
165165
"source": [
@@ -196,7 +196,7 @@
196196
},
197197
{
198198
"cell_type": "code",
199-
"execution_count": 8,
199+
"execution_count": 5,
200200
"metadata": {},
201201
"outputs": [],
202202
"source": [
@@ -248,7 +248,7 @@
248248
},
249249
{
250250
"cell_type": "code",
251-
"execution_count": 9,
251+
"execution_count": 6,
252252
"metadata": {},
253253
"outputs": [],
254254
"source": [
@@ -297,11 +297,11 @@
297297
"metadata": {},
298298
"source": [
299299
"### 3.3 Detect When to Summarise\n",
300-
"The Realtime model keeps a **large 128 k‑token window**, but quality can drift long before that limit as you stuff more context into the model.\n",
300+
"The Realtime model keeps a **large 32 k‑token window**, but quality can drift long before that limit as you stuff more context into the model.\n",
301301
"\n",
302302
"Our goal: **auto‑summarise** once the running window nears a safe threshold (default **2 000 tokens** for the notebook), then prune the superseded turns both locally *and* server‑side.\n",
303303
"\n",
304-
"We monitor latest_tokens returned in `response.done`. When it exceeds SUMMARY_TRIGGER and we have more than KEEP_LAST_TURNS, we spin up a background summarisation coroutine.\n",
304+
"We monitor latest_tokens returned in `response.done`. When it exceeds SUMMARY_TRIGGER and we have more than KEEP_LAST_TURNS, we spin up a background summarization coroutine.\n",
305305
"\n",
306306
"We compress everything except the last 2 turns into a single French paragraph, then:\n",
307307
"\n",
@@ -314,7 +314,7 @@
314314
},
315315
{
316316
"cell_type": "code",
317-
"execution_count": 10,
317+
"execution_count": 7,
318318
"metadata": {},
319319
"outputs": [],
320320
"source": [
@@ -343,7 +343,7 @@
343343
},
344344
{
345345
"cell_type": "code",
346-
"execution_count": 11,
346+
"execution_count": 8,
347347
"metadata": {},
348348
"outputs": [],
349349
"source": [
@@ -401,7 +401,7 @@
401401
},
402402
{
403403
"cell_type": "code",
404-
"execution_count": 12,
404+
"execution_count": 9,
405405
"metadata": {},
406406
"outputs": [],
407407
"source": [
@@ -451,7 +451,7 @@
451451
},
452452
{
453453
"cell_type": "code",
454-
"execution_count": 13,
454+
"execution_count": 10,
455455
"metadata": {},
456456
"outputs": [],
457457
"source": [
@@ -466,14 +466,14 @@
466466
},
467467
{
468468
"cell_type": "code",
469-
"execution_count": 14,
469+
"execution_count": 11,
470470
"metadata": {},
471471
"outputs": [],
472472
"source": [
473473
"# --------------------------------------------------------------------------- #\n",
474-
"# 🎤 Realtime session #\n",
474+
"# Realtime session #\n",
475475
"# --------------------------------------------------------------------------- #\n",
476-
"async def realtime_session(model=\"gpt-4o-realtime-preview\", voice=\"shimmer\", enable_playback=True):\n",
476+
"async def realtime_session(model=\"gpt-realtime\", voice=\"shimmer\", enable_playback=True):\n",
477477
" \"\"\"\n",
478478
" Main coroutine: connects to the Realtime endpoint, spawns helper tasks,\n",
479479
" and processes incoming events in a big async‑for loop.\n",
@@ -487,7 +487,7 @@
487487
" # Open the WebSocket connection to the Realtime API #\n",
488488
" # ----------------------------------------------------------------------- #\n",
489489
" url = f\"wss://api.openai.com/v1/realtime?model={model}\"\n",
490-
" headers = {\"Authorization\": f\"Bearer {openai.api_key}\", \"OpenAI-Beta\": \"realtime=v1\"}\n",
490+
" headers = {\"Authorization\": f\"Bearer {openai.api_key}\"}\n",
491491
"\n",
492492
" async with websockets.connect(url, extra_headers=headers, max_size=1 << 24) as ws:\n",
493493
" # ------------------------------------------------------------------- #\n",
@@ -503,6 +503,8 @@
503503
" await ws.send(json.dumps({\n",
504504
" \"type\": \"session.update\",\n",
505505
" \"session\": {\n",
506+
" \"type\": \"realtime\",\n",
507+
" model: \"gpt-realtime\",\n",
506508
" \"voice\": voice,\n",
507509
" \"modalities\": [\"audio\", \"text\"],\n",
508510
" \"input_audio_format\": \"pcm16\",\n",

examples/Realtime_prompting_guide.ipynb

Lines changed: 2 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99
"\n",
1010
"<img\n",
1111
" src=\"../images/realtime_prompting_guide.png\"\n",
12-
" style=\"width:450px; height:auto;\"\n",
12+
" style=\"width:450px; height:450px;\"\n",
1313
"/>\n",
1414
"\n",
1515
"\n",
@@ -20,39 +20,7 @@
2020
"\n",
2121
"The new gpt-realtime model delivers stronger instruction following, more reliable tool calling, noticeably better voice quality, and an overall smoother feel. These gains make it practical to move from chained approaches to true realtime experiences, cutting latency and producing responses that sound more natural and expressive.\n",
2222
"\n",
23-
"Realtime model benefits from different prompting techniques that wouldn't directly apply to text based models. This prompting guide starts with a suggested prompt skeleton, then walks through each part with practical tips, small patterns you can copy, and examples you can adapt to your use case.\n",
24-
"\n",
25-
"# Table of Contents\n",
26-
"\n",
27-
"- [Realtime Prompting Guide](#realtime-prompting-guide)\n",
28-
"- [General Tips](#general-tips)\n",
29-
"- [Prompt Structure](#prompt-structure)\n",
30-
"- [Role and Objective](#role-and-objective)\n",
31-
"- [Personality and Tone](#personality-and-tone)\n",
32-
" - [Speed Instructions](#speed-instructions)\n",
33-
" - [Language Constraint](#language-constraint)\n",
34-
" - [Reduce Repetition](#reduce-repetition)\n",
35-
"- [Reference Pronunciations](#reference-pronunciations)\n",
36-
" - [Alphanumeric Pronunciations](#alphanumeric-pronunciations)\n",
37-
"- [Instructions](#instructions)\n",
38-
" - [Instruction Following](#instruction-following)\n",
39-
" - [No audio or unclear audio](#no-audio-or-unclear-audio)\n",
40-
"- [Tools](#tools)\n",
41-
" - [Tool Selection](#tool-selection)\n",
42-
" - [Tool Call Preambles](#tool-call-preambles)\n",
43-
" - [Tool Call Preambles + Sample Phrases](#tool-call-preambles-sample-phrases)\n",
44-
" - [Tool Calls without Confirmation](#tool-calls-without-confirmation)\n",
45-
" - [Tool Call Performance](#tool-call-performance)\n",
46-
" - [Tool Level Behavior](#tool-level-behavior)\n",
47-
" - [Rephrase Supervisor Tool (Responder-Thinker Architecture)](#rephrase-supervisor-tool-responder-thinker-architecture)\n",
48-
" - [Common Tools](#common-tools)\n",
49-
"- [Conversation flow](#conversation-flow)\n",
50-
" - [Sample phrases](#sample-phrases)\n",
51-
" - [Conversation flow + Sample Phrases](#conversation-flow-sample-phrases)\n",
52-
" - [Advanced Conversation Flow](#advanced-conversation-flow)\n",
53-
" - [Conversation Flow as State Machine](#conversation-flow-as-state-machine)\n",
54-
" - [Dynamic Conversation Flow](#dynamic-conversation-flow)\n",
55-
"- [Safety & Escalation](#safety-escalation)"
23+
"Realtime model benefits from different prompting techniques that wouldn't directly apply to text based models. This prompting guide starts with a suggested prompt skeleton, then walks through each part with practical tips, small patterns you can copy, and examples you can adapt to your use case."
5624
]
5725
},
5826
{

0 commit comments

Comments
 (0)