Skip to content

Commit 133f5f4

Browse files
xitzhangXiting ZhangCopilot
authored
[VoiceLive] Update models for GA version (#43202)
* [VoiceLive] Add async function-calling agent sample * add phrase list * fix typo * Update sdk/ai/azure-ai-voicelive/samples/async_function_calling_sample.py Co-authored-by: Copilot <[email protected]> * Update sdk/ai/azure-ai-voicelive/samples/async_function_calling_sample.py Co-authored-by: Copilot <[email protected]> * update * fix typo * update changelog * update * remove breaking change section * update changelog * fix change log * revert changelog I lost * update version and change log * enable type verification * update * [VoiceLive] Relase 1.0.0b4 * Update EouDetection * update websocket option * Separate ResponseSession and RequestSession * make AgentConfig internal * update test for AgentConfig * update * fix pylint * update status * rename OAIVoice and ToolChoiceObject * update --------- Co-authored-by: Xiting Zhang <[email protected]> Co-authored-by: Copilot <[email protected]>
1 parent d07fc33 commit 133f5f4

20 files changed

+386
-265
lines changed

sdk/ai/azure-ai-voicelive/CHANGELOG.md

Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,51 @@
11
# Release History
22

3+
## 1.0.0 (Unreleased)
4+
5+
### Features Added
6+
7+
- **Enhanced WebSocket Connection Options**: Significantly improved WebSocket connection configuration with transport-agnostic design:
8+
- Added new timeout configuration options: `receive_timeout`, `close_timeout`, and `handshake_timeout` for fine-grained control
9+
- Enhanced `compression` parameter to support both boolean and integer types for advanced zlib window configuration
10+
- Added `vendor_options` parameter for implementation-specific options passthrough (escape hatch for advanced users)
11+
- Improved documentation with clearer descriptions for all connection parameters
12+
- Better support for common aliases from other WebSocket ecosystems (`max_size`, `ping_interval`, etc.)
13+
- More robust option mapping with proper type conversion and safety checks
14+
- **Enhanced Type Safety**: Improved type safety for content parts with proper enum usage:
15+
- `InputAudioContentPart`, `InputTextContentPart`, and `OutputTextContentPart` now use `ContentPartType` enum values instead of string literals
16+
- Better IntelliSense support and compile-time type checking for content part discriminators
17+
18+
### Breaking Changes
19+
20+
- **Improved Naming Conventions**: Updated model and enum names for better clarity and consistency:
21+
- `OAIVoice` enum renamed to `OpenAIVoiceName` for more descriptive naming
22+
- `ToolChoiceObject` model renamed to `ToolChoiceSelection` for better semantic meaning
23+
- `ToolChoiceFunctionObject` model renamed to `ToolChoiceFunctionSelection` for consistency
24+
- Updated type unions and imports to reflect the new naming conventions
25+
- Cross-language package mappings updated to maintain compatibility across SDKs
26+
- **Session Model Architecture**: Separated `ResponseSession` and `RequestSession` models for better design clarity:
27+
- `ResponseSession` no longer inherits from `RequestSession` and now inherits directly from `_Model`
28+
- All session configuration fields are now explicitly defined in `ResponseSession` instead of being inherited
29+
- This provides clearer separation of concerns between request and response session configurations
30+
- May affect type checking and code that relied on the previous inheritance relationship
31+
- **Model Cleanup**: Removed unused `AgentConfig` model and related fields from the public API:
32+
- `AgentConfig` class has been completely removed from imports and exports
33+
- `agent` field removed from `ResponseSession` model (including constructor parameter)
34+
- Updated cross-language package mappings to reflect the removal
35+
- **Model Naming Convention Update**: Renamed `EOUDetection` to `EouDetection` for better naming consistency:
36+
- Class name changed from `EOUDetection` to `EouDetection`
37+
- All inheritance relationships updated: `AzureSemanticDetection`, `AzureSemanticDetectionEn`, and `AzureSemanticDetectionMultilingual` now inherit from `EouDetection`
38+
- Type annotations updated in `AzureSemanticVad`, `AzureSemanticVadEn`, `AzureSemanticVadMultilingual`, and `ServerVad` classes
39+
- Import statements and exports updated to reflect the new naming
40+
- **Enhanced Content Part Type Safety**: Content part discriminators now use enum values instead of string literals:
41+
- `InputAudioContentPart.type` now uses `ContentPartType.INPUT_AUDIO` instead of `"input_audio"`
42+
- `InputTextContentPart.type` now uses `ContentPartType.INPUT_TEXT` instead of `"input_text"`
43+
- `OutputTextContentPart.type` now uses `ContentPartType.TEXT` instead of `"text"`
44+
45+
### Other Changes
46+
47+
- Initial GA release
48+
349
## 1.0.0b5 (2025-09-26)
450

551
### Features Added

sdk/ai/azure-ai-voicelive/apiview-properties.json

Lines changed: 6 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,6 @@
11
{
2-
"CrossLanguagePackageId": "VoiceLive.WebSocket",
2+
"CrossLanguagePackageId": "VoiceLive",
33
"CrossLanguageDefinitionId": {
4-
"azure.ai.voicelive.models.AgentConfig": "VoiceLive.AgentConfig",
54
"azure.ai.voicelive.models.Animation": "VoiceLive.Animation",
65
"azure.ai.voicelive.models.ConversationRequestItem": "VoiceLive.ConversationRequestItem",
76
"azure.ai.voicelive.models.MessageItem": "VoiceLive.MessageItem",
@@ -13,7 +12,7 @@
1312
"azure.ai.voicelive.models.AzureVoice": "VoiceLive.AzureVoice",
1413
"azure.ai.voicelive.models.AzureCustomVoice": "VoiceLive.AzureCustomVoice",
1514
"azure.ai.voicelive.models.AzurePersonalVoice": "VoiceLive.AzurePersonalVoice",
16-
"azure.ai.voicelive.models.EOUDetection": "VoiceLive.EOUDetection",
15+
"azure.ai.voicelive.models.EouDetection": "VoiceLive.EouDetection",
1716
"azure.ai.voicelive.models.AzureSemanticDetection": "VoiceLive.AzureSemanticDetection",
1817
"azure.ai.voicelive.models.AzureSemanticDetectionEn": "VoiceLive.AzureSemanticDetectionEn",
1918
"azure.ai.voicelive.models.AzureSemanticDetectionMultilingual": "VoiceLive.AzureSemanticDetectionMultilingual",
@@ -114,8 +113,8 @@
114113
"azure.ai.voicelive.models.SessionBase": "VoiceLive.SessionBase",
115114
"azure.ai.voicelive.models.SystemMessageItem": "VoiceLive.SystemMessageItem",
116115
"azure.ai.voicelive.models.TokenUsage": "VoiceLive.TokenUsage",
117-
"azure.ai.voicelive.models.ToolChoiceObject": "VoiceLive.ToolChoiceObject",
118-
"azure.ai.voicelive.models.ToolChoiceFunctionObject": "VoiceLive.ToolChoiceFunctionObject",
116+
"azure.ai.voicelive.models.ToolChoiceSelection": "VoiceLive.ToolChoiceObject",
117+
"azure.ai.voicelive.models.ToolChoiceFunctionSelection": "VoiceLive.ToolChoiceFunctionObject",
119118
"azure.ai.voicelive.models.UserMessageItem": "VoiceLive.UserMessageItem",
120119
"azure.ai.voicelive.models.VideoCrop": "VoiceLive.VideoCrop",
121120
"azure.ai.voicelive.models.VideoParams": "VoiceLive.VideoParams",
@@ -125,8 +124,9 @@
125124
"azure.ai.voicelive.models.ItemType": "VoiceLive.ItemType",
126125
"azure.ai.voicelive.models.ItemParamStatus": "VoiceLive.ItemParamStatus",
127126
"azure.ai.voicelive.models.MessageRole": "VoiceLive.MessageRole",
127+
"azure.ai.voicelive.models.ContentPartType": "VoiceLive.ContentPartType",
128128
"azure.ai.voicelive.models.Modality": "VoiceLive.Modality",
129-
"azure.ai.voicelive.models.OAIVoice": "VoiceLive.OAIVoice",
129+
"azure.ai.voicelive.models.OpenAIVoiceName": "VoiceLive.OAIVoice",
130130
"azure.ai.voicelive.models.AzureVoiceType": "VoiceLive.AzureVoiceType",
131131
"azure.ai.voicelive.models.PersonalVoiceModels": "VoiceLive.PersonalVoiceModels",
132132
"azure.ai.voicelive.models.OutputAudioFormat": "VoiceLive.OutputAudioFormat",
@@ -139,7 +139,6 @@
139139
"azure.ai.voicelive.models.ToolChoiceLiteral": "VoiceLive.ToolChoiceLiteral",
140140
"azure.ai.voicelive.models.ResponseStatus": "VoiceLive.ResponseStatus",
141141
"azure.ai.voicelive.models.ResponseItemStatus": "VoiceLive.ResponseItemStatus",
142-
"azure.ai.voicelive.models.ContentPartType": "VoiceLive.ContentPartType",
143142
"azure.ai.voicelive.models.ServerEventType": "VoiceLive.ServerEventType"
144143
}
145144
}

sdk/ai/azure-ai-voicelive/azure/ai/voicelive/_types.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -10,5 +10,5 @@
1010

1111
if TYPE_CHECKING:
1212
from . import models as _models
13-
Voice = Union[str, "_models.OAIVoice", "_models.OpenAIVoice", "_models.AzureVoice"]
14-
ToolChoice = Union[str, "_models.ToolChoiceLiteral", "_models.ToolChoiceObject"]
13+
Voice = Union[str, "_models.OpenAIVoiceName", "_models.OpenAIVoice", "_models.AzureVoice"]
14+
ToolChoice = Union[str, "_models.ToolChoiceLiteral", "_models.ToolChoiceSelection"]

sdk/ai/azure-ai-voicelive/azure/ai/voicelive/_version.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,4 +6,4 @@
66
# Changes may cause incorrect behavior and will be lost if the code is regenerated.
77
# --------------------------------------------------------------------------
88

9-
VERSION = "1.0.0b5"
9+
VERSION = "1.0.0"

sdk/ai/azure-ai-voicelive/azure/ai/voicelive/aio/_patch.py

Lines changed: 100 additions & 57 deletions
Original file line numberDiff line numberDiff line change
@@ -575,47 +575,56 @@ async def close(self, *, code: int = 1000, reason: str = "") -> None:
575575

576576
class WebsocketConnectionOptions(TypedDict, total=False):
577577
"""
578-
Advanced WebSocket connection options for VoiceLive API connections.
578+
Transport-agnostic WebSocket connection options for VoiceLive.
579579
580-
These options correspond to parameters accepted by :mod:`aiohttp`'s
581-
`ws_connect` method and control low-level WebSocket behavior.
582-
All keys are optional — if omitted, default values will be applied.
580+
These control common WS behaviors (compression, message size limits,
581+
timeouts, ping/pong handling). Unless specified, defaults are determined
582+
by the underlying WebSocket library.
583583
584-
:keyword compression: Enable per-message compression.
585-
- ``True`` enables compression.
586-
- ``False`` disables compression.
587-
If omitted, defaults to the aiohttp default.
588-
:type compression: bool
584+
:keyword compression: Enable per-message compression. Use ``True`` to enable,
585+
``False`` to disable. Advanced users may pass an ``int`` to select a zlib
586+
window value if supported by the transport.
587+
:type compression: bool | int
589588
590-
:keyword max_msg_size: Maximum message size in bytes.
591-
Messages larger than this limit will cause the connection to close.
592-
If omitted, defaults to 10 MiB (10 * 1024 * 1024).
589+
:keyword max_msg_size: Maximum message size in bytes before the client closes
590+
the connection.
593591
:type max_msg_size: int
594592
595-
:keyword timeout: Close timeout in seconds.
596-
Maximum time to wait for the connection to close gracefully.
597-
If omitted, defaults to aiohttp's internal default.
598-
:type timeout: float
599-
600-
:keyword heartbeat: Interval in seconds for sending ping frames to keep
601-
the connection alive. If omitted, defaults to 30 seconds.
593+
:keyword heartbeat: Interval in seconds between keep-alive pings.
602594
:type heartbeat: float
603595
604-
:keyword autoclose: Automatically close the connection when a close frame
605-
is received. Defaults to True if omitted.
596+
:keyword autoclose: Automatically close when a close frame is received.
606597
:type autoclose: bool
607598
608599
:keyword autoping: Automatically respond to ping frames with pong frames.
609-
Defaults to True if omitted.
610600
:type autoping: bool
601+
602+
:keyword receive_timeout: Max seconds to wait for a single incoming message
603+
on an established WebSocket.
604+
:type receive_timeout: float
605+
606+
:keyword close_timeout: Max seconds to wait for a graceful close handshake.
607+
:type close_timeout: float
608+
609+
:keyword handshake_timeout: Max seconds for connection establishment
610+
(DNS/TCP/TLS + WS upgrade). Note: with aiohttp this is applied on the
611+
ClientSession (not a ws_connect kwarg), so must be handled by the caller.
612+
:type handshake_timeout: float
613+
614+
:keyword vendor_options: Optional implementation-specific options passed
615+
through as-is to the underlying library (not part of the stable API).
616+
:type vendor_options: Mapping[str, Any]
611617
"""
612618

613-
compression: NotRequired[bool]
619+
compression: NotRequired[Union[bool, int]]
614620
max_msg_size: NotRequired[int]
615-
timeout: NotRequired[float]
616621
heartbeat: NotRequired[float]
617622
autoclose: NotRequired[bool]
618623
autoping: NotRequired[bool]
624+
receive_timeout: NotRequired[float]
625+
close_timeout: NotRequired[float]
626+
handshake_timeout: NotRequired[float]
627+
vendor_options: NotRequired[Mapping[str, Any]]
619628

620629

621630
class _VoiceLiveConnectionManager(AbstractAsyncContextManager["VoiceLiveConnection"]):
@@ -624,9 +633,9 @@ class _VoiceLiveConnectionManager(AbstractAsyncContextManager["VoiceLiveConnecti
624633
def __init__(
625634
self,
626635
*,
627-
credential: Union[AzureKeyCredential, AsyncTokenCredential],
636+
credential: Union["AzureKeyCredential", "AsyncTokenCredential"],
628637
endpoint: str,
629-
api_version: str = "2025-05-01-preview",
638+
api_version: str = "2025-10-01",
630639
model: Optional[str] = None,
631640
extra_query: Mapping[str, Any],
632641
extra_headers: Mapping[str, Any],
@@ -635,48 +644,82 @@ def __init__(
635644
) -> None:
636645
self._credential = credential
637646
self._endpoint = endpoint
638-
raw_scopes = kwargs.pop(
639-
"credential_scopes",
640-
["https://ai.azure.com/.default"],
641-
)
642-
if isinstance(raw_scopes, str):
643-
self.__credential_scopes = [raw_scopes]
644-
else:
645-
self.__credential_scopes = list(raw_scopes)
647+
raw_scopes = kwargs.pop("credential_scopes", ["https://ai.azure.com/.default"])
648+
self.__credential_scopes = [raw_scopes] if isinstance(raw_scopes, str) else list(raw_scopes)
646649
self.__api_version = api_version
647650
self.__model = model
648651

649-
self.__connection: Optional[VoiceLiveConnection] = None
652+
self.__connection: Optional["VoiceLiveConnection"] = None
650653
self.__extra_query = extra_query
651654
self.__extra_headers = extra_headers
652-
self.__connection_options = self._map_websocket_options(connection_options or {})
655+
self.__connection_options = self._map_to_aiohttp_ws_options(connection_options or {})
653656
self.__proxy_policy = kwargs.get("proxy_policy") or policies.ProxyPolicy(**kwargs)
654657

655-
def _map_websocket_options(self, options: WebsocketConnectionOptions) -> dict[str, Any]:
658+
def _map_to_aiohttp_ws_options(self, options: WebsocketConnectionOptions) -> dict[str, Any]:
656659
"""
657-
Map user options to :mod:`aiohttp` ``ws_connect`` kwargs (accept both TypedDict keys and common aliases).
660+
Map neutral WebSocket options to :mod:`aiohttp` ``ClientSession.ws_connect`` kwargs.
658661
659-
:param options: The user-provided WebSocket options.
662+
NOTE:
663+
- ``receive_timeout`` and ``close_timeout`` are mapped into a single
664+
``aiohttp.ClientWSTimeout`` instance passed as the ``timeout=`` kwarg.
665+
- ``handshake_timeout`` is NOT an ``ws_connect`` kwarg in aiohttp; it must be
666+
applied via ``aiohttp.ClientTimeout`` on the session by the caller.
667+
668+
:param options: User-provided WebSocket options.
660669
:type options: ~azure.ai.voicelive.aio.WebsocketConnectionOptions
661-
:return: Mapped options suitable for ``aiohttp.ClientSession.ws_connect``.
670+
:return: Options suitable for ``aiohttp.ClientSession.ws_connect``.
662671
:rtype: dict[str, Any]
663672
"""
664-
# copy to a plain dict so we can safely check/pop alias keys without mypy complaints
665673
src: dict[str, Any] = dict(options)
666674
mapped: dict[str, Any] = {}
667-
# aliases commonly used by other libs
668-
if "max_size" in src:
669-
mapped["max_msg_size"] = src.pop("max_size")
670-
if "close_timeout" in src:
671-
mapped["timeout"] = src.pop("close_timeout")
672-
if "ping_interval" in src:
673-
mapped["heartbeat"] = src.pop("ping_interval")
674-
if "compression" in src:
675-
mapped["compress"] = src.pop("compression")
676-
# pass through supported aiohttp-style keys from our TypedDict
677-
for key in ("max_msg_size", "timeout", "heartbeat", "autoclose", "autoping"):
678-
if key in src:
679-
mapped[key] = src[key]
675+
676+
# --- Neutral -> aiohttp mapping ---
677+
678+
# compression (neutral) -> compress (aiohttp expects int or None)
679+
comp = src.pop("compression", None)
680+
if comp is True:
681+
mapped["compress"] = -1 # enable compression with default window bits
682+
elif comp is False:
683+
mapped["compress"] = None # disable compression
684+
elif isinstance(comp, int):
685+
mapped["compress"] = comp # power user provided zlib window value
686+
687+
# max message size
688+
if "max_msg_size" in src:
689+
mapped["max_msg_size"] = int(src.pop("max_msg_size"))
690+
691+
# ping interval
692+
if "heartbeat" in src:
693+
mapped["heartbeat"] = float(src.pop("heartbeat"))
694+
695+
# autoclose / autoping
696+
if "autoclose" in src:
697+
mapped["autoclose"] = bool(src.pop("autoclose"))
698+
if "autoping" in src:
699+
mapped["autoping"] = bool(src.pop("autoping"))
700+
701+
# Build ClientWSTimeout for receive/close timeouts
702+
ws_timeout_kwargs: dict[str, float] = {}
703+
recv_to = src.pop("receive_timeout", None)
704+
if recv_to is not None:
705+
ws_timeout_kwargs["ws_receive"] = float(recv_to)
706+
close_to = src.pop("close_timeout", None)
707+
if close_to is not None:
708+
ws_timeout_kwargs["ws_close"] = float(close_to)
709+
if ws_timeout_kwargs:
710+
mapped["timeout"] = aiohttp.ClientWSTimeout(**ws_timeout_kwargs)
711+
712+
# handshake_timeout is not a ws_connect kwarg; caller must apply it on session
713+
_ = src.pop("handshake_timeout", None) # intentionally ignored here
714+
715+
# --- Vendor-specific passthrough (escape hatch) ---
716+
vendor = src.pop("vendor_options", None)
717+
if isinstance(vendor, Mapping):
718+
for k, v in vendor.items():
719+
mapped.setdefault(k, v)
720+
721+
# Any leftover keys in `src` are intentionally ignored to avoid leaking
722+
# transport-specific names into our public surface.
680723
return mapped
681724

682725
async def __aenter__(self) -> VoiceLiveConnection:
@@ -690,7 +733,7 @@ async def __aenter__(self) -> VoiceLiveConnection:
690733
url = self._prepare_url()
691734
log.debug("Connecting to %s", url)
692735

693-
self.__connection_options.setdefault("max_msg_size", 10 * 1024 * 1024)
736+
self.__connection_options.setdefault("max_msg_size", 4 * 1024 * 1024)
694737
self.__connection_options.setdefault("heartbeat", 30)
695738

696739
if self.__proxy_policy:
@@ -778,7 +821,7 @@ def connect(
778821
*,
779822
credential: Union[AzureKeyCredential, AsyncTokenCredential],
780823
endpoint: str,
781-
api_version: str = "2025-05-01-preview",
824+
api_version: str = "2025-10-01",
782825
model: Optional[str] = None,
783826
query: Optional[Mapping[str, Any]] = None,
784827
headers: Optional[Mapping[str, Any]] = None,
@@ -798,7 +841,7 @@ def connect(
798841
:paramtype type credential: ~azure.core.credentials.AzureKeyCredential or ~azure.core.credentials.AsyncTokenCredential
799842
:keyword endpoint: Service endpoint, e.g., ``https://<region>.api.cognitive.microsoft.com``.
800843
:paramtype type endpoint: str
801-
:keyword api_version: The API version to use. Defaults to ``"2025-05-01-preview"``.
844+
:keyword api_version: The API version to use. Defaults to ``"2025-10-01"``.
802845
:paramtype type api_version: str
803846
:keyword model: Model identifier to use for the session.
804847
In most scenarios, this parameter is required.

0 commit comments

Comments
 (0)