Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 11 additions & 9 deletions examples/Realtime_out_of_band_transcription.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -38,9 +38,10 @@
},
{
"cell_type": "markdown",
"id": "1c0f46ad",
"metadata": {},
"source": [
"## 1. Why use out-of-band transcription?\n",
"# 1. Why use out-of-band transcription?\n",
"\n",
"The Realtime API offers built-in user input transcription, but this relies on a **separate ASR model** (e.g., gpt-4o-transcribe). Using different models for transcription and response generation can lead to discrepancies. For example:\n",
"\n",
Expand Down Expand Up @@ -100,9 +101,10 @@
},
{
"cell_type": "markdown",
"id": "63ccae3d",
"metadata": {},
"source": [
"## 2. Requirements & Setup\n",
"# 2. Requirements & Setup\n",
"\n",
"Ensure your environment meets these requirements:\n",
"\n",
Expand Down Expand Up @@ -144,7 +146,7 @@
"id": "d7d60089",
"metadata": {},
"source": [
"## 3. Prompts\n",
"# 3. Prompts\n",
"\n",
"We use **two distinct prompts**:\n",
"\n",
Expand Down Expand Up @@ -201,7 +203,7 @@
"id": "4ddbd683",
"metadata": {},
"source": [
"## 4. Core configuration\n",
"# 4. Core configuration\n",
"\n",
"We define:\n",
"\n",
Expand Down Expand Up @@ -291,7 +293,7 @@
"id": "a905ec16",
"metadata": {},
"source": [
"## 5. Building the Realtime session & the out‑of‑band request\n",
"# 5. Building the Realtime session & the out‑of‑band request\n",
"\n",
"The Realtime session (`session.update`) configures:\n",
"\n",
Expand Down Expand Up @@ -394,7 +396,7 @@
"id": "9afe7911",
"metadata": {},
"source": [
"## 6. Audio streaming: mic → Realtime → speakers\n",
"# 6. Audio streaming: mic → Realtime → speakers\n",
"\n",
"We now define:\n",
"\n",
Expand Down Expand Up @@ -506,7 +508,7 @@
"id": "d02cc1bd",
"metadata": {},
"source": [
"## 7. Extracting and comparing transcripts\n",
"# 7. Extracting and comparing transcripts\n",
"\n",
"The function below enables us to generate **two transcripts** for each user turn:\n",
"\n",
Expand Down Expand Up @@ -556,7 +558,7 @@
"id": "6025bbf6",
"metadata": {},
"source": [
"## 8. Listening for Realtime events\n",
"# 8. Listening for Realtime events\n",
"\n",
"`listen_for_events` drives the session:\n",
"\n",
Expand Down Expand Up @@ -739,7 +741,7 @@
"id": "10c69ded",
"metadata": {},
"source": [
"## 9. Run Script\n",
"# 9. Run Script\n",
"\n",
"In this step, we run the the code which will allow us to view the realtime model transcription vs transcription model transcriptions. The code does the following:\n",
"\n",
Expand Down