@@ -7,6 +7,8 @@ typed server events (including audio) for responsive, interruptible conversation
7
7
8
8
> ** Status:** Preview. APIs are subject to change.
9
9
10
+ > ** Important:** As of version 1.0.0b5, this SDK is ** async-only** . The synchronous API has been removed to focus exclusively on async patterns. All examples and samples use ` async ` /` await ` syntax.
11
+
10
12
---
11
13
12
14
Getting started
@@ -25,21 +27,14 @@ Getting started
25
27
# Base install (core client only)
26
28
python -m pip install azure-ai-voicelive
27
29
28
- # For synchronous streaming (uses websockets)
29
- python -m pip install " azure-ai-voicelive[websockets]"
30
-
31
30
# For asynchronous streaming (uses aiohttp)
32
31
python -m pip install " azure-ai-voicelive[aiohttp]"
33
32
34
- # For both sync + async scenarios (recommended if unsure )
35
- python -m pip install " azure-ai-voicelive[all-websockets] " pyaudio python-dotenv
33
+ # For voice samples (includes audio processing )
34
+ python -m pip install azure-ai-voicelive[aiohttp] pyaudio python-dotenv
36
35
```
37
36
38
- WebSocket streaming features require additional dependencies.
39
- Install them with:
40
- pip install "azure-ai-voicelive[ websockets] " # for sync
41
- pip install "azure-ai-voicelive[ aiohttp] " # for async
42
- pip install "azure-ai-voicelive[ all-websockets] " # for both
37
+ The SDK now exclusively provides async-only WebSocket connections using ` aiohttp ` .
43
38
44
39
### Authenticate
45
40
@@ -58,50 +53,65 @@ AZURE_VOICELIVE_ENDPOINT="your-endpoint"
58
53
Then, use the key in your code:
59
54
60
55
``` python
56
+ import asyncio
61
57
from azure.core.credentials import AzureKeyCredential
62
58
from azure.ai.voicelive import connect
63
59
64
- connection = connect(
65
- endpoint = " your-endpoint" ,
66
- credential = AzureKeyCredential(" your-api-key" ),
67
- model = " gpt-4o-realtime-preview"
68
- )
60
+ async def main ():
61
+ async with connect(
62
+ endpoint = " your-endpoint" ,
63
+ credential = AzureKeyCredential(" your-api-key" ),
64
+ model = " gpt-4o-realtime-preview"
65
+ ) as connection:
66
+ # Your async code here
67
+ pass
68
+
69
+ asyncio.run(main())
69
70
```
70
71
71
72
#### AAD Token Authentication
72
73
73
74
For production applications, AAD authentication is recommended:
74
75
75
76
``` python
76
- from azure.identity import DefaultAzureCredential
77
+ import asyncio
78
+ from azure.identity.aio import DefaultAzureCredential
77
79
from azure.ai.voicelive import connect
78
80
79
- credential = DefaultAzureCredential()
81
+ async def main ():
82
+ credential = DefaultAzureCredential()
83
+
84
+ async with connect(
85
+ endpoint = " your-endpoint" ,
86
+ credential = credential,
87
+ model = " gpt-4o-realtime-preview"
88
+ ) as connection:
89
+ # Your async code here
90
+ pass
80
91
81
- connection = connect(
82
- endpoint = " your-endpoint" ,
83
- credential = credential,
84
- model = " gpt-4o-realtime-preview"
85
- )
92
+ asyncio.run(main())
86
93
```
87
94
88
95
---
89
96
90
97
Key concepts
91
98
------------
92
99
93
- - ** VoiceLiveConnection** – Manages an active WebSocket connection to the service
100
+ - ** VoiceLiveConnection** – Manages an active async WebSocket connection to the service
94
101
- ** Session Management** – Configure conversation parameters:
95
- - ** SessionResource** – Update session parameters (voice, formats, VAD)
102
+ - ** SessionResource** – Update session parameters (voice, formats, VAD) with async methods
96
103
- ** RequestSession** – Strongly-typed session configuration
97
104
- ** ServerVad** – Configure voice activity detection
98
105
- ** AzureStandardVoice** – Configure voice settings
99
106
- ** Audio Handling** :
100
- - ** InputAudioBufferResource** – Manage audio input to the service
101
- - ** OutputAudioBufferResource** – Control audio output from the service
107
+ - ** InputAudioBufferResource** – Manage audio input to the service with async methods
108
+ - ** OutputAudioBufferResource** – Control audio output from the service with async methods
102
109
- ** Conversation Management** :
103
- - ** ResponseResource** – Create or cancel model responses
104
- - ** ConversationResource** – Manage conversation items
110
+ - ** ResponseResource** – Create or cancel model responses with async methods
111
+ - ** ConversationResource** – Manage conversation items with async methods
112
+ - ** Error Handling** :
113
+ - ** ConnectionError** – Base exception for WebSocket connection errors
114
+ - ** ConnectionClosed** – Raised when WebSocket connection is closed
105
115
- ** Strongly-Typed Events** – Process service events with type safety:
106
116
- ` SESSION_UPDATED ` , ` RESPONSE_AUDIO_DELTA ` , ` RESPONSE_DONE `
107
117
- ` INPUT_AUDIO_BUFFER_SPEECH_STARTED ` , ` INPUT_AUDIO_BUFFER_SPEECH_STOPPED `
@@ -112,25 +122,25 @@ Key concepts
112
122
Examples
113
123
--------
114
124
115
- ### Basic async Voice Assistant (Featured Sample)
125
+ ### Basic Voice Assistant (Featured Sample)
116
126
117
- The Basic async Voice Assistant sample demonstrates full-featured voice interaction with:
127
+ The Basic Voice Assistant sample demonstrates full-featured voice interaction with:
118
128
119
129
- Real-time speech streaming
120
- - Server-side voice activity detection
130
+ - Server-side voice activity detection
121
131
- Interruption handling
122
132
- High-quality audio processing
123
133
124
134
``` bash
125
135
# Run the basic voice assistant sample
126
- # Requires [aiohttp] for async (easiest: [all-websockets])
136
+ # Requires [aiohttp] for async
127
137
python samples/basic_voice_assistant_async.py
128
138
129
139
# With custom parameters
130
140
python samples/basic_voice_assistant_async.py --model gpt-4o-realtime-preview --voice alloy --instructions " You're a helpful assistant"
131
141
```
132
142
133
- ### Minimal async example
143
+ ### Minimal example
134
144
135
145
``` python
136
146
import asyncio
@@ -172,44 +182,6 @@ async def main():
172
182
asyncio.run(main())
173
183
```
174
184
175
- ### Minimal sync example
176
-
177
- ``` python
178
- from azure.core.credentials import AzureKeyCredential
179
- from azure.ai.voicelive import connect
180
- from azure.ai.voicelive.models import (
181
- RequestSession, Modality, InputAudioFormat, OutputAudioFormat, ServerVad, ServerEventType
182
- )
183
-
184
- API_KEY = " your-api-key"
185
- ENDPOINT = " your-endpoint"
186
- MODEL = " gpt-4o-realtime-preview"
187
-
188
- with connect(
189
- endpoint = ENDPOINT ,
190
- credential = AzureKeyCredential(API_KEY ),
191
- model = MODEL
192
- ) as conn:
193
- session = RequestSession(
194
- modalities = [Modality.TEXT , Modality.AUDIO ],
195
- instructions = " You are a helpful assistant." ,
196
- input_audio_format = InputAudioFormat.PCM16 ,
197
- output_audio_format = OutputAudioFormat.PCM16 ,
198
- turn_detection = ServerVad(
199
- threshold = 0.5 ,
200
- prefix_padding_ms = 300 ,
201
- silence_duration_ms = 500
202
- ),
203
- )
204
- conn.session.update(session = session)
205
-
206
- # Process events
207
- for evt in conn:
208
- print (f " Event: { evt.type} " )
209
- if evt.type == ServerEventType.RESPONSE_DONE :
210
- break
211
- ```
212
-
213
185
Available Voice Options
214
186
-----------------------
215
187
@@ -279,12 +251,8 @@ Troubleshooting
279
251
Verify ` AZURE_VOICELIVE_ENDPOINT ` , network rules, and that your credential has access.
280
252
281
253
- ** Missing WebSocket dependencies:**
282
- If you see:
283
- WebSocket streaming features require additional dependencies.
284
- Install them with:
285
- pip install "azure-ai-voicelive[ websockets] " # for sync
286
- pip install "azure-ai-voicelive[ aiohttp] " # for async
287
- pip install "azure-ai-voicelive[ all-websockets] " # for both
254
+ If you see import errors, make sure you have installed the package:
255
+ pip install azure-ai-voicelive[ aiohttp]
288
256
289
257
- ** Auth failures:**
290
258
For API key, double-check ` AZURE_VOICELIVE_API_KEY ` . For AAD, ensure the identity is authorized.
0 commit comments