Skip to content

Commit eec115c

Browse files
xitzhangXiting ZhangCopilot
authored
[VoiceLive] Remove sync APIs, enhanced semantic detection and video background support (#43131)
* [VoiceLive] Add async function-calling agent sample * add phrase list * fix typo * Update sdk/ai/azure-ai-voicelive/samples/async_function_calling_sample.py Co-authored-by: Copilot <[email protected]> * Update sdk/ai/azure-ai-voicelive/samples/async_function_calling_sample.py Co-authored-by: Copilot <[email protected]> * update * fix typo * update changelog * update * remove breaking change section * update changelog * fix change log * revert changelog I lost * update version and change log * enable type verification * update * [VoiceLive] Relase 1.0.0b4 * [VoiceLive] Enhanced semantic detection, video background support, and model consistency improvements * Remove sync API * Reanme `AudioInputTranscriptionSettings` to `AudioInputTranscriptionOptions` * typo * remove indentation * fix diarization * update * update * Update sdk/ai/azure-ai-voicelive/azure/ai/voicelive/models/_models.py Co-authored-by: Copilot <[email protected]> * Update sdk/ai/azure-ai-voicelive/azure/ai/voicelive/models/_models.py Co-authored-by: Copilot <[email protected]> * update models * update * add unit tests * remove useless reference * new line for pylint check --------- Co-authored-by: Xiting Zhang <[email protected]> Co-authored-by: Copilot <[email protected]>
1 parent e8918b0 commit eec115c

27 files changed

+4085
-2062
lines changed

sdk/ai/azure-ai-voicelive/CHANGELOG.md

Lines changed: 94 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,99 @@
11
# Release History
22

3+
## 1.0.0b5 (Unreleased)
4+
5+
### Features Added
6+
7+
- **Enhanced Semantic Detection Type Safety**: Added new `EouThresholdLevel` enum for better type safety in end-of-utterance detection:
8+
- `LOW` for low sensitivity threshold level
9+
- `MEDIUM` for medium sensitivity threshold level
10+
- `HIGH` for high sensitivity threshold level
11+
- `DEFAULT` for default sensitivity threshold level
12+
- **Improved Semantic Detection Configuration**: Enhanced semantic detection classes with better type annotations:
13+
- `threshold_level` parameter now supports both string values and `EouThresholdLevel` enum
14+
- Cleaner type definitions for `AzureSemanticDetection`, `AzureSemanticDetectionEn`, and `AzureSemanticDetectionMultilingual`
15+
- Improved documentation for threshold level parameters
16+
- **Comprehensive Unit Test Suite**: Added extensive unit test coverage with 200+ test cases covering:
17+
- All enum types and their functionality
18+
- Model creation, validation, and serialization
19+
- Async connection functionality with proper mocking
20+
- Client event handling and workflows
21+
- Voice configuration across all supported types
22+
- Message handling with content part hierarchy
23+
- Integration scenarios and real-world usage patterns
24+
- Recent changes validation and backwards compatibility
25+
- **API Version Update**: Updated to API version `2025-10-01` (from `2025-05-01-preview`)
26+
- **Enhanced Type Safety**: Added new `AzureVoiceType` enum with values for better Azure voice type categorization:
27+
- `AZURE_CUSTOM` for custom voice configurations
28+
- `AZURE_STANDARD` for standard voice configurations
29+
- `AZURE_PERSONAL` for personal voice configurations
30+
- **Improved Message Handling**: Added `MessageRole` enum for better role type safety in message items
31+
- **Enhanced Model Documentation**: Comprehensive documentation improvements across all models:
32+
- Added detailed docstrings for model classes and their parameters
33+
- Enhanced enum value documentation with descriptions
34+
- Improved type annotations and parameter descriptions
35+
- **Enhanced Semantic Detection**: Added improved configuration options for all semantic detection classes:
36+
- Added `threshold_level` parameter with options: `"low"`, `"medium"`, `"high"`, `"default"` (recommended over deprecated `threshold`)
37+
- Added `timeout_ms` parameter for timeout configuration in milliseconds (recommended over deprecated `timeout`)
38+
- **Video Background Support**: Added new `Background` model for video background customization:
39+
- Support for solid color backgrounds in hex format (e.g., `#00FF00FF`)
40+
- Support for image URL backgrounds
41+
- Mutually exclusive color and image URL options
42+
- **Enhanced Video Parameters**: Extended `VideoParams` model with:
43+
- `background` parameter for configuring video backgrounds using the new `Background` model
44+
- `gop_size` parameter for Group of Pictures (GOP) size control, affecting compression efficiency and seeking performance
45+
- **Improved Type Safety**: Added `TurnDetectionType` enum for better type safety and IntelliSense support
46+
- **Package Structure Modernization**: Simplified package initialization with namespace package support
47+
- **Enhanced Error Handling**: Added `ConnectionError` and `ConnectionClosed` exception classes to the async API for better WebSocket error management
48+
49+
### Breaking Changes
50+
51+
- **Cross-Language Package Identity Update**: Updated package ID from `VoiceLive` to `VoiceLive.WebSocket` for better cross-language consistency
52+
- **Model Refactoring**:
53+
- Renamed `UserContentPart` to `MessageContentPart` for clearer content part hierarchy
54+
- All message items now require a `content` field with list of `MessageContentPart` objects
55+
- `OutputTextContentPart` now inherits from `MessageContentPart` instead of being standalone
56+
- **Enhanced Type Safety**:
57+
- Azure voice classes now use `AzureVoiceType` enum discriminators instead of string literals
58+
- Message role discriminators now use `MessageRole` enum values for better type safety
59+
- **Removed Deprecated Parameters**: Completely removed deprecated parameters from semantic detection classes:
60+
- Removed `threshold` parameter from all semantic detection classes (`AzureSemanticDetection`, `AzureSemanticDetectionEn`, `AzureSemanticDetectionMultilingual`)
61+
- Removed `timeout` parameter from all semantic detection classes
62+
- Users must now use `threshold_level` and `timeout_ms` parameters respectively
63+
- **Removed Synchronous API**: Completely removed synchronous WebSocket operations to focus exclusively on async patterns:
64+
- Removed sync `connect()` function and sync `VoiceLiveConnection` class from main patch implementation
65+
- Removed sync `basic_voice_assistant.py` sample (only async version remains)
66+
- Simplified sync patch to minimal structure with empty exports
67+
- All functionality now available only through async patterns
68+
- **Updated Dependencies**: Modified package dependencies to reflect async-only architecture:
69+
- Moved `aiohttp>=3.9.0,<4.0.0` from optional to required dependency
70+
- Removed `websockets` optional dependency as sync API no longer exists
71+
- Removed optional dependency groups `websockets`, `aiohttp`, and `all-websockets`
72+
- **Model Rename**:
73+
- Renamed `AudioInputTranscriptionSettings` to `AudioInputTranscriptionOptions` for consistency with naming conventions
74+
- Renamed `AzureMultilingualSemanticVad` to `AzureSemanticVadMultilingual` for naming consistency with other multilingual variants
75+
- **Enhanced Type Safety**: Turn detection discriminator types now use enum values instead of string literals for better type safety
76+
77+
### Bug Fixes
78+
79+
- **Serialization Improvements**: Fixed type casting issue in serialization utilities for better enum handling and type safety
80+
81+
### Other Changes
82+
83+
- **Testing Infrastructure**: Added comprehensive unit test suite with extensive coverage:
84+
- 8 main test files with 200+ individual test methods
85+
- Tests for all enums, models, async operations, client events, voice configurations, and message handling
86+
- Integration tests covering real-world scenarios and recent changes
87+
- Proper mocking for async WebSocket connections
88+
- Backwards compatibility validation
89+
- Test coverage for all recent changes and enhancements
90+
- **API Documentation**: Updated API view properties to reflect model structure changes, new enums, and cross-language package identity
91+
- **Documentation Updates**: Comprehensive updates to all markdown documentation:
92+
- Updated README.md to reflect async-only nature with updated examples and installation instructions
93+
- Updated samples README.md to remove sync sample references
94+
- Enhanced BASIC_VOICE_ASSISTANT.md with comprehensive async implementation guide
95+
- Added MIGRATION_GUIDE.md for users upgrading from previous versions
96+
397
## 1.0.0b4 (2025-09-19)
498

599
### Features Added

sdk/ai/azure-ai-voicelive/README.md

Lines changed: 45 additions & 77 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,8 @@ typed server events (including audio) for responsive, interruptible conversation
77

88
> **Status:** Preview. APIs are subject to change.
99
10+
> **Important:** As of version 1.0.0b5, this SDK is **async-only**. The synchronous API has been removed to focus exclusively on async patterns. All examples and samples use `async`/`await` syntax.
11+
1012
---
1113

1214
Getting started
@@ -25,21 +27,14 @@ Getting started
2527
# Base install (core client only)
2628
python -m pip install azure-ai-voicelive
2729

28-
# For synchronous streaming (uses websockets)
29-
python -m pip install "azure-ai-voicelive[websockets]"
30-
3130
# For asynchronous streaming (uses aiohttp)
3231
python -m pip install "azure-ai-voicelive[aiohttp]"
3332

34-
# For both sync + async scenarios (recommended if unsure)
35-
python -m pip install "azure-ai-voicelive[all-websockets]" pyaudio python-dotenv
33+
# For voice samples (includes audio processing)
34+
python -m pip install azure-ai-voicelive[aiohttp] pyaudio python-dotenv
3635
```
3736

38-
WebSocket streaming features require additional dependencies.
39-
Install them with:
40-
pip install "azure-ai-voicelive[websockets]" # for sync
41-
pip install "azure-ai-voicelive[aiohttp]" # for async
42-
pip install "azure-ai-voicelive[all-websockets]" # for both
37+
The SDK now exclusively provides async-only WebSocket connections using `aiohttp`.
4338

4439
### Authenticate
4540

@@ -58,50 +53,65 @@ AZURE_VOICELIVE_ENDPOINT="your-endpoint"
5853
Then, use the key in your code:
5954

6055
```python
56+
import asyncio
6157
from azure.core.credentials import AzureKeyCredential
6258
from azure.ai.voicelive import connect
6359

64-
connection = connect(
65-
endpoint="your-endpoint",
66-
credential=AzureKeyCredential("your-api-key"),
67-
model="gpt-4o-realtime-preview"
68-
)
60+
async def main():
61+
async with connect(
62+
endpoint="your-endpoint",
63+
credential=AzureKeyCredential("your-api-key"),
64+
model="gpt-4o-realtime-preview"
65+
) as connection:
66+
# Your async code here
67+
pass
68+
69+
asyncio.run(main())
6970
```
7071

7172
#### AAD Token Authentication
7273

7374
For production applications, AAD authentication is recommended:
7475

7576
```python
76-
from azure.identity import DefaultAzureCredential
77+
import asyncio
78+
from azure.identity.aio import DefaultAzureCredential
7779
from azure.ai.voicelive import connect
7880

79-
credential = DefaultAzureCredential()
81+
async def main():
82+
credential = DefaultAzureCredential()
83+
84+
async with connect(
85+
endpoint="your-endpoint",
86+
credential=credential,
87+
model="gpt-4o-realtime-preview"
88+
) as connection:
89+
# Your async code here
90+
pass
8091

81-
connection = connect(
82-
endpoint="your-endpoint",
83-
credential=credential,
84-
model="gpt-4o-realtime-preview"
85-
)
92+
asyncio.run(main())
8693
```
8794

8895
---
8996

9097
Key concepts
9198
------------
9299

93-
- **VoiceLiveConnection** – Manages an active WebSocket connection to the service
100+
- **VoiceLiveConnection** – Manages an active async WebSocket connection to the service
94101
- **Session Management** – Configure conversation parameters:
95-
- **SessionResource** – Update session parameters (voice, formats, VAD)
102+
- **SessionResource** – Update session parameters (voice, formats, VAD) with async methods
96103
- **RequestSession** – Strongly-typed session configuration
97104
- **ServerVad** – Configure voice activity detection
98105
- **AzureStandardVoice** – Configure voice settings
99106
- **Audio Handling**:
100-
- **InputAudioBufferResource** – Manage audio input to the service
101-
- **OutputAudioBufferResource** – Control audio output from the service
107+
- **InputAudioBufferResource** – Manage audio input to the service with async methods
108+
- **OutputAudioBufferResource** – Control audio output from the service with async methods
102109
- **Conversation Management**:
103-
- **ResponseResource** – Create or cancel model responses
104-
- **ConversationResource** – Manage conversation items
110+
- **ResponseResource** – Create or cancel model responses with async methods
111+
- **ConversationResource** – Manage conversation items with async methods
112+
- **Error Handling**:
113+
- **ConnectionError** – Base exception for WebSocket connection errors
114+
- **ConnectionClosed** – Raised when WebSocket connection is closed
105115
- **Strongly-Typed Events** – Process service events with type safety:
106116
- `SESSION_UPDATED`, `RESPONSE_AUDIO_DELTA`, `RESPONSE_DONE`
107117
- `INPUT_AUDIO_BUFFER_SPEECH_STARTED`, `INPUT_AUDIO_BUFFER_SPEECH_STOPPED`
@@ -112,25 +122,25 @@ Key concepts
112122
Examples
113123
--------
114124

115-
### Basic async Voice Assistant (Featured Sample)
125+
### Basic Voice Assistant (Featured Sample)
116126

117-
The Basic async Voice Assistant sample demonstrates full-featured voice interaction with:
127+
The Basic Voice Assistant sample demonstrates full-featured voice interaction with:
118128

119129
- Real-time speech streaming
120-
- Server-side voice activity detection
130+
- Server-side voice activity detection
121131
- Interruption handling
122132
- High-quality audio processing
123133

124134
```bash
125135
# Run the basic voice assistant sample
126-
# Requires [aiohttp] for async (easiest: [all-websockets])
136+
# Requires [aiohttp] for async
127137
python samples/basic_voice_assistant_async.py
128138

129139
# With custom parameters
130140
python samples/basic_voice_assistant_async.py --model gpt-4o-realtime-preview --voice alloy --instructions "You're a helpful assistant"
131141
```
132142

133-
### Minimal async example
143+
### Minimal example
134144

135145
```python
136146
import asyncio
@@ -172,44 +182,6 @@ async def main():
172182
asyncio.run(main())
173183
```
174184

175-
### Minimal sync example
176-
177-
```python
178-
from azure.core.credentials import AzureKeyCredential
179-
from azure.ai.voicelive import connect
180-
from azure.ai.voicelive.models import (
181-
RequestSession, Modality, InputAudioFormat, OutputAudioFormat, ServerVad, ServerEventType
182-
)
183-
184-
API_KEY = "your-api-key"
185-
ENDPOINT = "your-endpoint"
186-
MODEL = "gpt-4o-realtime-preview"
187-
188-
with connect(
189-
endpoint=ENDPOINT,
190-
credential=AzureKeyCredential(API_KEY),
191-
model=MODEL
192-
) as conn:
193-
session = RequestSession(
194-
modalities=[Modality.TEXT, Modality.AUDIO],
195-
instructions="You are a helpful assistant.",
196-
input_audio_format=InputAudioFormat.PCM16,
197-
output_audio_format=OutputAudioFormat.PCM16,
198-
turn_detection=ServerVad(
199-
threshold=0.5,
200-
prefix_padding_ms=300,
201-
silence_duration_ms=500
202-
),
203-
)
204-
conn.session.update(session=session)
205-
206-
# Process events
207-
for evt in conn:
208-
print(f"Event: {evt.type}")
209-
if evt.type == ServerEventType.RESPONSE_DONE:
210-
break
211-
```
212-
213185
Available Voice Options
214186
-----------------------
215187

@@ -279,12 +251,8 @@ Troubleshooting
279251
Verify `AZURE_VOICELIVE_ENDPOINT`, network rules, and that your credential has access.
280252

281253
- **Missing WebSocket dependencies:**
282-
If you see:
283-
WebSocket streaming features require additional dependencies.
284-
Install them with:
285-
pip install "azure-ai-voicelive[websockets]" # for sync
286-
pip install "azure-ai-voicelive[aiohttp]" # for async
287-
pip install "azure-ai-voicelive[all-websockets]" # for both
254+
If you see import errors, make sure you have installed the package:
255+
pip install azure-ai-voicelive[aiohttp]
288256

289257
- **Auth failures:**
290258
For API key, double-check `AZURE_VOICELIVE_API_KEY`. For AAD, ensure the identity is authorized.
Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,3 @@
11
{
2-
"apiVersion": "2025-05-01-preview"
2+
"apiVersion": "2025-10-01"
33
}

sdk/ai/azure-ai-voicelive/apiview-properties.json

Lines changed: 12 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,27 +1,28 @@
11
{
2-
"CrossLanguagePackageId": "VoiceLive",
2+
"CrossLanguagePackageId": "VoiceLive.WebSocket",
33
"CrossLanguageDefinitionId": {
44
"azure.ai.voicelive.models.AgentConfig": "VoiceLive.AgentConfig",
55
"azure.ai.voicelive.models.Animation": "VoiceLive.Animation",
66
"azure.ai.voicelive.models.ConversationRequestItem": "VoiceLive.ConversationRequestItem",
77
"azure.ai.voicelive.models.MessageItem": "VoiceLive.MessageItem",
88
"azure.ai.voicelive.models.AssistantMessageItem": "VoiceLive.AssistantMessageItem",
99
"azure.ai.voicelive.models.AudioEchoCancellation": "VoiceLive.AudioEchoCancellation",
10-
"azure.ai.voicelive.models.AudioInputTranscriptionSettings": "VoiceLive.AudioInputTranscriptionSettings",
10+
"azure.ai.voicelive.models.AudioInputTranscriptionOptions": "VoiceLive.AudioInputTranscriptionOptions",
1111
"azure.ai.voicelive.models.AudioNoiseReduction": "VoiceLive.AudioNoiseReduction",
1212
"azure.ai.voicelive.models.AvatarConfig": "VoiceLive.AvatarConfig",
1313
"azure.ai.voicelive.models.AzureVoice": "VoiceLive.AzureVoice",
1414
"azure.ai.voicelive.models.AzureCustomVoice": "VoiceLive.AzureCustomVoice",
15-
"azure.ai.voicelive.models.TurnDetection": "VoiceLive.TurnDetection",
16-
"azure.ai.voicelive.models.AzureMultilingualSemanticVad": "VoiceLive.AzureMultilingualSemanticVad",
1715
"azure.ai.voicelive.models.AzurePersonalVoice": "VoiceLive.AzurePersonalVoice",
1816
"azure.ai.voicelive.models.EOUDetection": "VoiceLive.EOUDetection",
1917
"azure.ai.voicelive.models.AzureSemanticDetection": "VoiceLive.AzureSemanticDetection",
2018
"azure.ai.voicelive.models.AzureSemanticDetectionEn": "VoiceLive.AzureSemanticDetectionEn",
2119
"azure.ai.voicelive.models.AzureSemanticDetectionMultilingual": "VoiceLive.AzureSemanticDetectionMultilingual",
20+
"azure.ai.voicelive.models.TurnDetection": "VoiceLive.TurnDetection",
2221
"azure.ai.voicelive.models.AzureSemanticVad": "VoiceLive.AzureSemanticVad",
2322
"azure.ai.voicelive.models.AzureSemanticVadEn": "VoiceLive.AzureSemanticVadEn",
23+
"azure.ai.voicelive.models.AzureSemanticVadMultilingual": "VoiceLive.AzureSemanticVadMultilingual",
2424
"azure.ai.voicelive.models.AzureStandardVoice": "VoiceLive.AzureStandardVoice",
25+
"azure.ai.voicelive.models.Background": "VoiceLive.Background",
2526
"azure.ai.voicelive.models.CachedTokenDetails": "VoiceLive.CachedTokenDetails",
2627
"azure.ai.voicelive.models.ClientEvent": "VoiceLive.ClientEvent",
2728
"azure.ai.voicelive.models.ClientEventConversationItemCreate": "VoiceLive.ClientEventConversationItemCreate",
@@ -48,7 +49,7 @@
4849
"azure.ai.voicelive.models.Tool": "VoiceLive.Tool",
4950
"azure.ai.voicelive.models.FunctionTool": "VoiceLive.FunctionTool",
5051
"azure.ai.voicelive.models.IceServer": "VoiceLive.IceServer",
51-
"azure.ai.voicelive.models.UserContentPart": "VoiceLive.UserContentPart",
52+
"azure.ai.voicelive.models.MessageContentPart": "VoiceLive.MessageContentPart",
5253
"azure.ai.voicelive.models.InputAudioContentPart": "VoiceLive.InputAudioContentPart",
5354
"azure.ai.voicelive.models.InputTextContentPart": "VoiceLive.InputTextContentPart",
5455
"azure.ai.voicelive.models.InputTokenDetails": "VoiceLive.InputTokenDetails",
@@ -123,19 +124,22 @@
123124
"azure.ai.voicelive.models.ClientEventType": "VoiceLive.ClientEventType",
124125
"azure.ai.voicelive.models.ItemType": "VoiceLive.ItemType",
125126
"azure.ai.voicelive.models.ItemParamStatus": "VoiceLive.ItemParamStatus",
127+
"azure.ai.voicelive.models.MessageRole": "VoiceLive.MessageRole",
126128
"azure.ai.voicelive.models.Modality": "VoiceLive.Modality",
127129
"azure.ai.voicelive.models.OAIVoice": "VoiceLive.OAIVoice",
130+
"azure.ai.voicelive.models.AzureVoiceType": "VoiceLive.AzureVoiceType",
128131
"azure.ai.voicelive.models.PersonalVoiceModels": "VoiceLive.PersonalVoiceModels",
129132
"azure.ai.voicelive.models.OutputAudioFormat": "VoiceLive.OutputAudioFormat",
130133
"azure.ai.voicelive.models.ToolType": "VoiceLive.ToolType",
131134
"azure.ai.voicelive.models.AnimationOutputType": "VoiceLive.AnimationOutputType",
132135
"azure.ai.voicelive.models.InputAudioFormat": "VoiceLive.InputAudioFormat",
136+
"azure.ai.voicelive.models.TurnDetectionType": "VoiceLive.TurnDetectionType",
137+
"azure.ai.voicelive.models.EouThresholdLevel": "VoiceLive.EouThresholdLevel",
133138
"azure.ai.voicelive.models.AudioTimestampType": "VoiceLive.AudioTimestampType",
134139
"azure.ai.voicelive.models.ToolChoiceLiteral": "VoiceLive.ToolChoiceLiteral",
135-
"azure.ai.voicelive.models.ServerEventType": "VoiceLive.ServerEventType",
140+
"azure.ai.voicelive.models.ResponseStatus": "VoiceLive.ResponseStatus",
136141
"azure.ai.voicelive.models.ResponseItemStatus": "VoiceLive.ResponseItemStatus",
137-
"azure.ai.voicelive.models.MessageRole": "VoiceLive.MessageRole",
138142
"azure.ai.voicelive.models.ContentPartType": "VoiceLive.ContentPartType",
139-
"azure.ai.voicelive.models.ResponseStatus": "VoiceLive.ResponseStatus"
143+
"azure.ai.voicelive.models.ServerEventType": "VoiceLive.ServerEventType"
140144
}
141145
}

0 commit comments

Comments
 (0)