Skip to content

Commit 9afe0f0

Browse files
xitzhangXiting ZhangCopilot
authored
[VoiceLive] Update service models for v1.0.0b4 (#43062)
* [VoiceLive] Add async function-calling agent sample * add phrase list * fix typo * Update sdk/ai/azure-ai-voicelive/samples/async_function_calling_sample.py Co-authored-by: Copilot <[email protected]> * Update sdk/ai/azure-ai-voicelive/samples/async_function_calling_sample.py Co-authored-by: Copilot <[email protected]> * update * fix typo * update changelog * update * remove breaking change section * update changelog * fix change log * revert changelog I lost * update version and change log * enable type verification * update * [VoiceLive] Update service models for v1.0.0b4 * fix typo * Update AudioFormat and env names * update models * fix pylint * fix pylint --------- Co-authored-by: Xiting Zhang <[email protected]> Co-authored-by: Copilot <[email protected]>
1 parent 18e186c commit 9afe0f0

18 files changed

+440
-758
lines changed

sdk/ai/azure-ai-voicelive/.env.template

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -6,9 +6,9 @@ AZURE_VOICELIVE_API_KEY=your-voicelive-api-key
66
AZURE_VOICELIVE_ENDPOINT=wss://api.voicelive.com/v1
77

88
# Optional configuration
9-
VOICELIVE_MODEL=gpt-4o-realtime-preview
10-
VOICELIVE_VOICE=alloy
11-
VOICELIVE_INSTRUCTIONS=You are a helpful assistant. Keep your responses concise.
9+
AZURE_VOICELIVE_MODEL=gpt-4o-realtime-preview
10+
AZURE_VOICELIVE_VOICE=alloy
11+
AZURE_VOICELIVE_INSTRUCTIONS=You are a helpful assistant. Keep your responses concise.
1212

1313
# For audio samples
1414
AUDIO_FILE=path/to/your/test_audio.wav

sdk/ai/azure-ai-voicelive/CHANGELOG.md

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,38 @@
11
# Release History
22

3+
## 1.0.0b4 (Unreleased)
4+
5+
### Features Added
6+
7+
- **Personal Voice Models**: Added `PersonalVoiceModels` enum with support for `DragonLatestNeural`, `PhoenixLatestNeural`, and `PhoenixV2Neural` models
8+
- **Enhanced Animation Support**: Added comprehensive server event classes for animation blendshapes and viseme handling:
9+
- `ServerEventResponseAnimationBlendshapeDelta` and `ServerEventResponseAnimationBlendshapeDone`
10+
- `ServerEventResponseAnimationVisemeDelta` and `ServerEventResponseAnimationVisemeDone`
11+
- **Audio Timestamp Events**: Added `ServerEventResponseAudioTimestampDelta` and `ServerEventResponseAudioTimestampDone` for better audio timing control
12+
- **Improved Error Handling**: Added `ErrorResponse` class for better error management
13+
- **Enhanced Base Classes**: Added `ConversationItemBase` and `SessionBase` for better code organization and inheritance
14+
- **Token Usage Improvements**: Renamed `Usage` to `TokenUsage` for better clarity
15+
- **Audio Format Improvements**: Reorganized audio format enums with separate `InputAudioFormat` and `OutputAudioFormat` enums for better clarity
16+
- **Enhanced Output Audio Format Support**: Added more granular output audio format options including specific sampling rates (8kHz, 16kHz) for PCM16
17+
18+
### Breaking Changes
19+
20+
- **Model Cleanup**: Removed experimental classes `AzurePlatformVoice`, `LLMVoice`, `AzureSemanticVadServer`, `InputAudio`, `NoTurnDetection`, and `ToolChoiceFunctionObjectFunction`
21+
- **Class Rename**: Renamed `Usage` class to `TokenUsage` for better clarity
22+
- **Enum Reorganization**:
23+
- Replaced `AudioFormat` enum with separate `InputAudioFormat` and `OutputAudioFormat` enums
24+
- Removed `Phi4mmVoice` enum
25+
- Removed `EMOTION` value from `AnimationOutputType` enum
26+
- Removed `IN_PROGRESS` value from `ItemParamStatus` enum
27+
- **Server Events**: Removed `RESPONSE_EMOTION_HYPOTHESIS` from `ServerEventType` enum
28+
29+
### Other Changes
30+
31+
- **Package Structure**: Simplified package initialization with namespace package support
32+
- **Sample Updates**: Improved basic voice assistant samples
33+
- **Code Optimization**: Streamlined model definitions with significant code reduction
34+
- **API Configuration**: Updated API view properties for better tooling support
35+
336
## 1.0.0b3 (2025-09-17)
437

538
### Features Added

sdk/ai/azure-ai-voicelive/README.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -137,7 +137,7 @@ import asyncio
137137
from azure.core.credentials import AzureKeyCredential
138138
from azure.ai.voicelive.aio import connect
139139
from azure.ai.voicelive.models import (
140-
RequestSession, Modality, AudioFormat, ServerVad, ServerEventType
140+
RequestSession, Modality, InputAudioFormat, OutputAudioFormat, ServerVad, ServerEventType
141141
)
142142

143143
API_KEY = "your-api-key"
@@ -153,8 +153,8 @@ async def main():
153153
session = RequestSession(
154154
modalities=[Modality.TEXT, Modality.AUDIO],
155155
instructions="You are a helpful assistant.",
156-
input_audio_format=AudioFormat.PCM16,
157-
output_audio_format=AudioFormat.PCM16,
156+
input_audio_format=InputAudioFormat.PCM16,
157+
output_audio_format=OutputAudioFormat.PCM16,
158158
turn_detection=ServerVad(
159159
threshold=0.5,
160160
prefix_padding_ms=300,
@@ -178,7 +178,7 @@ asyncio.run(main())
178178
from azure.core.credentials import AzureKeyCredential
179179
from azure.ai.voicelive import connect
180180
from azure.ai.voicelive.models import (
181-
RequestSession, Modality, AudioFormat, ServerVad, ServerEventType
181+
RequestSession, Modality, InputAudioFormat, OutputAudioFormat, ServerVad, ServerEventType
182182
)
183183

184184
API_KEY = "your-api-key"
@@ -193,8 +193,8 @@ with connect(
193193
session = RequestSession(
194194
modalities=[Modality.TEXT, Modality.AUDIO],
195195
instructions="You are a helpful assistant.",
196-
input_audio_format=AudioFormat.PCM16,
197-
output_audio_format=AudioFormat.PCM16,
196+
input_audio_format=InputAudioFormat.PCM16,
197+
output_audio_format=OutputAudioFormat.PCM16,
198198
turn_detection=ServerVad(
199199
threshold=0.5,
200200
prefix_padding_ms=300,

sdk/ai/azure-ai-voicelive/apiview-properties.json

Lines changed: 23 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -15,14 +15,12 @@
1515
"azure.ai.voicelive.models.TurnDetection": "VoiceLive.TurnDetection",
1616
"azure.ai.voicelive.models.AzureMultilingualSemanticVad": "VoiceLive.AzureMultilingualSemanticVad",
1717
"azure.ai.voicelive.models.AzurePersonalVoice": "VoiceLive.AzurePersonalVoice",
18-
"azure.ai.voicelive.models.AzurePlatformVoice": "VoiceLive.AzurePlatformVoice",
1918
"azure.ai.voicelive.models.EOUDetection": "VoiceLive.EOUDetection",
2019
"azure.ai.voicelive.models.AzureSemanticDetection": "VoiceLive.AzureSemanticDetection",
2120
"azure.ai.voicelive.models.AzureSemanticDetectionEn": "VoiceLive.AzureSemanticDetectionEn",
2221
"azure.ai.voicelive.models.AzureSemanticDetectionMultilingual": "VoiceLive.AzureSemanticDetectionMultilingual",
2322
"azure.ai.voicelive.models.AzureSemanticVad": "VoiceLive.AzureSemanticVad",
2423
"azure.ai.voicelive.models.AzureSemanticVadEn": "VoiceLive.AzureSemanticVadEn",
25-
"azure.ai.voicelive.models.AzureSemanticVadServer": "VoiceLive.AzureSemanticVadServer",
2624
"azure.ai.voicelive.models.AzureStandardVoice": "VoiceLive.AzureStandardVoice",
2725
"azure.ai.voicelive.models.CachedTokenDetails": "VoiceLive.CachedTokenDetails",
2826
"azure.ai.voicelive.models.ClientEvent": "VoiceLive.ClientEvent",
@@ -43,19 +41,18 @@
4341
"azure.ai.voicelive.models.ClientEventSessionAvatarConnect": "VoiceLive.ClientEventSessionAvatarConnect",
4442
"azure.ai.voicelive.models.ClientEventSessionUpdate": "VoiceLive.ClientEventSessionUpdate",
4543
"azure.ai.voicelive.models.ContentPart": "VoiceLive.ContentPart",
44+
"azure.ai.voicelive.models.ConversationItemBase": "VoiceLive.ConversationItemBase",
45+
"azure.ai.voicelive.models.ErrorResponse": "VoiceLive.ErrorResponse",
4646
"azure.ai.voicelive.models.FunctionCallItem": "VoiceLive.FunctionCallItem",
4747
"azure.ai.voicelive.models.FunctionCallOutputItem": "VoiceLive.FunctionCallOutputItem",
4848
"azure.ai.voicelive.models.Tool": "VoiceLive.Tool",
4949
"azure.ai.voicelive.models.FunctionTool": "VoiceLive.FunctionTool",
5050
"azure.ai.voicelive.models.IceServer": "VoiceLive.IceServer",
51-
"azure.ai.voicelive.models.InputAudio": "VoiceLive.InputAudio",
5251
"azure.ai.voicelive.models.UserContentPart": "VoiceLive.UserContentPart",
5352
"azure.ai.voicelive.models.InputAudioContentPart": "VoiceLive.InputAudioContentPart",
5453
"azure.ai.voicelive.models.InputTextContentPart": "VoiceLive.InputTextContentPart",
5554
"azure.ai.voicelive.models.InputTokenDetails": "VoiceLive.InputTokenDetails",
56-
"azure.ai.voicelive.models.LLMVoice": "VoiceLive.LLMVoice",
5755
"azure.ai.voicelive.models.LogProbProperties": "VoiceLive.LogProbProperties",
58-
"azure.ai.voicelive.models.NoTurnDetection": "VoiceLive.NoTurnDetection",
5956
"azure.ai.voicelive.models.OpenAIVoice": "VoiceLive.OpenAIVoice",
6057
"azure.ai.voicelive.models.OutputTextContentPart": "VoiceLive.OutputTextContentPart",
6158
"azure.ai.voicelive.models.OutputTokenDetails": "VoiceLive.OutputTokenDetails",
@@ -89,15 +86,21 @@
8986
"azure.ai.voicelive.models.ServerEventInputAudioBufferCommitted": "VoiceLive.ServerEventInputAudioBufferCommitted",
9087
"azure.ai.voicelive.models.ServerEventInputAudioBufferSpeechStarted": "VoiceLive.ServerEventInputAudioBufferSpeechStarted",
9188
"azure.ai.voicelive.models.ServerEventInputAudioBufferSpeechStopped": "VoiceLive.ServerEventInputAudioBufferSpeechStopped",
89+
"azure.ai.voicelive.models.ServerEventResponseAnimationBlendshapeDelta": "VoiceLive.ServerEventResponseAnimationBlendshapeDelta",
90+
"azure.ai.voicelive.models.ServerEventResponseAnimationBlendshapeDone": "VoiceLive.ServerEventResponseAnimationBlendshapeDone",
91+
"azure.ai.voicelive.models.ServerEventResponseAnimationVisemeDelta": "VoiceLive.ServerEventResponseAnimationVisemeDelta",
92+
"azure.ai.voicelive.models.ServerEventResponseAnimationVisemeDone": "VoiceLive.ServerEventResponseAnimationVisemeDone",
9293
"azure.ai.voicelive.models.ServerEventResponseAudioDelta": "VoiceLive.ServerEventResponseAudioDelta",
9394
"azure.ai.voicelive.models.ServerEventResponseAudioDone": "VoiceLive.ServerEventResponseAudioDone",
95+
"azure.ai.voicelive.models.ServerEventResponseAudioTimestampDelta": "VoiceLive.ServerEventResponseAudioTimestampDelta",
96+
"azure.ai.voicelive.models.ServerEventResponseAudioTimestampDone": "VoiceLive.ServerEventResponseAudioTimestampDone",
9497
"azure.ai.voicelive.models.ServerEventResponseAudioTranscriptDelta": "VoiceLive.ServerEventResponseAudioTranscriptDelta",
9598
"azure.ai.voicelive.models.ServerEventResponseAudioTranscriptDone": "VoiceLive.ServerEventResponseAudioTranscriptDone",
9699
"azure.ai.voicelive.models.ServerEventResponseContentPartAdded": "VoiceLive.ServerEventResponseContentPartAdded",
97100
"azure.ai.voicelive.models.ServerEventResponseContentPartDone": "VoiceLive.ServerEventResponseContentPartDone",
98101
"azure.ai.voicelive.models.ServerEventResponseCreated": "VoiceLive.ServerEventResponseCreated",
99102
"azure.ai.voicelive.models.ServerEventResponseDone": "VoiceLive.ServerEventResponseDone",
100-
"azure.ai.voicelive.models.ServerEventResponseFunctionCallArgumentsDelta": "VoiceLive.ServerEventResponseFunctionCallArgumentsDelta",
103+
"azure.ai.voicelive.models.ServerEventResponseFunctionCallArgumentsDelta": "VoiceLive.ServerEventResponseFunctionCallArgumentsDelta",
101104
"azure.ai.voicelive.models.ServerEventResponseFunctionCallArgumentsDone": "VoiceLive.ServerEventResponseFunctionCallArgumentsDone",
102105
"azure.ai.voicelive.models.ServerEventResponseOutputItemAdded": "VoiceLive.ServerEventResponseOutputItemAdded",
103106
"azure.ai.voicelive.models.ServerEventResponseOutputItemDone": "VoiceLive.ServerEventResponseOutputItemDone",
@@ -107,31 +110,32 @@
107110
"azure.ai.voicelive.models.ServerEventSessionCreated": "VoiceLive.ServerEventSessionCreated",
108111
"azure.ai.voicelive.models.ServerEventSessionUpdated": "VoiceLive.ServerEventSessionUpdated",
109112
"azure.ai.voicelive.models.ServerVad": "VoiceLive.ServerVad",
113+
"azure.ai.voicelive.models.SessionBase": "VoiceLive.SessionBase",
110114
"azure.ai.voicelive.models.SystemMessageItem": "VoiceLive.SystemMessageItem",
115+
"azure.ai.voicelive.models.TokenUsage": "VoiceLive.TokenUsage",
111116
"azure.ai.voicelive.models.ToolChoiceObject": "VoiceLive.ToolChoiceObject",
112117
"azure.ai.voicelive.models.ToolChoiceFunctionObject": "VoiceLive.ToolChoiceFunctionObject",
113-
"azure.ai.voicelive.models.ToolChoiceFunctionObjectFunction": "VoiceLive.ToolChoiceFunctionObject.function.anonymous",
114-
"azure.ai.voicelive.models.Usage": "VoiceLive.Usage",
115118
"azure.ai.voicelive.models.UserMessageItem": "VoiceLive.UserMessageItem",
116119
"azure.ai.voicelive.models.VideoCrop": "VoiceLive.VideoCrop",
117120
"azure.ai.voicelive.models.VideoParams": "VoiceLive.VideoParams",
118121
"azure.ai.voicelive.models.VideoResolution": "VoiceLive.VideoResolution",
119122
"azure.ai.voicelive.models.VoiceLiveErrorDetails": "VoiceLive.VoiceLiveErrorDetails",
120-
"azure.ai.voicelive.models.ServerEventType": "VoiceLive.ServerEventType",
123+
"azure.ai.voicelive.models.ClientEventType": "VoiceLive.ClientEventType",
121124
"azure.ai.voicelive.models.ItemType": "VoiceLive.ItemType",
122-
"azure.ai.voicelive.models.ResponseItemStatus": "VoiceLive.ResponseItemStatus",
123-
"azure.ai.voicelive.models.MessageRole": "VoiceLive.MessageRole",
124-
"azure.ai.voicelive.models.ContentPartType": "VoiceLive.ContentPartType",
125-
"azure.ai.voicelive.models.ResponseStatus": "VoiceLive.ResponseStatus",
126-
"azure.ai.voicelive.models.OAIVoice": "VoiceLive.OAIVoice",
127-
"azure.ai.voicelive.models.Phi4mmVoice": "VoiceLive.Phi4mmVoice",
128-
"azure.ai.voicelive.models.AudioFormat": "VoiceLive.AudioFormat",
125+
"azure.ai.voicelive.models.ItemParamStatus": "VoiceLive.ItemParamStatus",
129126
"azure.ai.voicelive.models.Modality": "VoiceLive.Modality",
127+
"azure.ai.voicelive.models.OAIVoice": "VoiceLive.OAIVoice",
128+
"azure.ai.voicelive.models.PersonalVoiceModels": "VoiceLive.PersonalVoiceModels",
129+
"azure.ai.voicelive.models.OutputAudioFormat": "VoiceLive.OutputAudioFormat",
130+
"azure.ai.voicelive.models.ToolType": "VoiceLive.ToolType",
130131
"azure.ai.voicelive.models.AnimationOutputType": "VoiceLive.AnimationOutputType",
132+
"azure.ai.voicelive.models.InputAudioFormat": "VoiceLive.InputAudioFormat",
131133
"azure.ai.voicelive.models.AudioTimestampType": "VoiceLive.AudioTimestampType",
132-
"azure.ai.voicelive.models.ToolType": "VoiceLive.ToolType",
133134
"azure.ai.voicelive.models.ToolChoiceLiteral": "VoiceLive.ToolChoiceLiteral",
134-
"azure.ai.voicelive.models.ClientEventType": "VoiceLive.ClientEventType",
135-
"azure.ai.voicelive.models.ItemParamStatus": "VoiceLive.ItemParamStatus"
135+
"azure.ai.voicelive.models.ServerEventType": "VoiceLive.ServerEventType",
136+
"azure.ai.voicelive.models.ResponseItemStatus": "VoiceLive.ResponseItemStatus",
137+
"azure.ai.voicelive.models.MessageRole": "VoiceLive.MessageRole",
138+
"azure.ai.voicelive.models.ContentPartType": "VoiceLive.ContentPartType",
139+
"azure.ai.voicelive.models.ResponseStatus": "VoiceLive.ResponseStatus"
136140
}
137141
}

sdk/ai/azure-ai-voicelive/azure/ai/voicelive/_patch.py

Lines changed: 12 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -636,8 +636,8 @@ def __init__(
636636
*,
637637
credential: Union[AzureKeyCredential, TokenCredential],
638638
endpoint: str,
639-
model: str,
640639
api_version: str,
640+
model: Optional[str] = None,
641641
extra_query: Optional[Mapping[str, Any]] = None,
642642
extra_headers: Optional[Mapping[str, Any]] = None,
643643
connection_options: Optional[WebsocketConnectionOptions] = None,
@@ -646,8 +646,8 @@ def __init__(
646646
self._credential = credential
647647
self._endpoint = endpoint
648648
self.__credential_scopes = kwargs.pop("credential_scopes", "https://cognitiveservices.azure.com/.default")
649-
self.__model = model
650649
self.__api_version = api_version
650+
self.__model = model
651651
self.__connection: Optional[VoiceLiveConnection] = None
652652
self.__extra_query = extra_query
653653
self.__extra_headers = extra_headers
@@ -731,7 +731,9 @@ def _prepare_url(self) -> str:
731731
parsed = urlparse(self._endpoint)
732732
scheme = "wss" if parsed.scheme == "https" else ("ws" if parsed.scheme == "http" else parsed.scheme)
733733

734-
params: dict[str, str] = {"model": self.__model, "api-version": self.__api_version}
734+
params: dict[str, Any] = {"api-version": self.__api_version}
735+
if self.__model is not None:
736+
params["model"] = self.__model
735737
extra_query: Mapping[str, Any] = self.__extra_query or {}
736738
for k, v in extra_query.items():
737739
params[str(k)] = str(v)
@@ -750,8 +752,8 @@ def connect(
750752
*,
751753
endpoint: str,
752754
credential: Union[AzureKeyCredential, TokenCredential],
753-
model: str,
754755
api_version: str = "2025-05-01-preview",
756+
model: Optional[str] = None,
755757
query: Optional[Mapping[str, Any]] = None,
756758
headers: Optional[Mapping[str, Any]] = None,
757759
connection_options: Optional[WebsocketConnectionOptions] = None,
@@ -777,10 +779,13 @@ def connect(
777779
:paramtype endpoint: str
778780
:keyword credential: Credential used to authenticate the WebSocket connection.
779781
:paramtype credential: ~azure.core.credentials.AzureKeyCredential or ~azure.core.credentials.TokenCredential
780-
:keyword model: Model identifier to use for the session.
781-
:paramtype model: str
782782
:keyword api_version: API version to use. Defaults to ``"2025-05-01-preview"``.
783783
:paramtype api_version: str
784+
:keyword model: Model identifier to use for the session.
785+
In most scenarios, this parameter is required.
786+
It may be omitted only when connecting through an **Agent** scenario,
787+
in which case the service will use the model associated with the Agent.
788+
:paramtype model: str
784789
:keyword query: Optional query parameters to include in the WebSocket URL.
785790
:paramtype query: Mapping[str, Any] or None
786791
:keyword headers: Optional headers to include in the WebSocket handshake.
@@ -796,8 +801,8 @@ def connect(
796801
return _VoiceLiveConnectionManager(
797802
credential=credential,
798803
endpoint=endpoint,
799-
model=model,
800804
api_version=api_version,
805+
model=model,
801806
extra_query=query or {},
802807
extra_headers=headers or {},
803808
connection_options=connection_options,

sdk/ai/azure-ai-voicelive/azure/ai/voicelive/_types.py

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,5 @@
1010

1111
if TYPE_CHECKING:
1212
from . import models as _models
13-
Voice = Union[
14-
str, "_models.OAIVoice", "_models.OpenAIVoice", "_models.AzureVoice", str, "_models.Phi4mmVoice", "_models.LLMVoice"
15-
]
13+
Voice = Union[str, "_models.OAIVoice", "_models.OpenAIVoice", "_models.AzureVoice"]
1614
ToolChoice = Union[str, "_models.ToolChoiceLiteral", "_models.ToolChoiceObject"]

sdk/ai/azure-ai-voicelive/azure/ai/voicelive/_utils/model_base.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
1+
# pylint: disable=line-too-long,useless-suppression,too-many-lines
12
# coding=utf-8
23
# --------------------------------------------------------------------------
34
# Copyright (c) Microsoft Corporation. All rights reserved.

sdk/ai/azure-ai-voicelive/azure/ai/voicelive/_version.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,4 +6,4 @@
66
# Changes may cause incorrect behavior and will be lost if the code is regenerated.
77
# --------------------------------------------------------------------------
88

9-
VERSION = "1.0.0b3"
9+
VERSION = "1.0.0b4"

0 commit comments

Comments
 (0)