Skip to content

Commit 3dbf3ac

Browse files
authored
VoiceLive SDK Updates (Azure#52870)
* Generator updats & other fixes * Latest generator & tsp * Generated updates * Changelog update * Rmove debug file * More generator updates * Changelog update * PS script updates" * Latest generator * PR Feedback * Add headers collection, simplify serialization for a few objects * Test Cleanup * Update export API
1 parent 6b6bad8 commit 3dbf3ac

File tree

186 files changed

+4968
-3299
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

186 files changed

+4968
-3299
lines changed

sdk/ai/Azure.AI.VoiceLive/CHANGELOG.md

Lines changed: 203 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,12 +4,215 @@
44

55
### Features Added
66

7+
- Added `VideoBackground` class to support video background customization with `Color` and `ImageUrl` properties
8+
- Added new properties to `VideoParams`:
9+
- `Background` (VideoBackground): Configure video background settings
10+
- `GopSize` (int?): Configure Group of Pictures size
11+
- Added new properties to turn detection classes for enhanced control:
12+
- `CreateResponse` (bool?): Added to `ServerVadTurnDetection`, `AzureSemanticVadTurnDetection`, and related classes
13+
- `InterruptResponse` (bool?): Added to the same turn detection classes
14+
- Added string constructor overloads for message item classes:
15+
- `AssistantMessageItem(string assistantMessageText)`
16+
- `SystemMessageItem(string systemMessageText)`
17+
- `UserMessageItem(string userMessageText)`
18+
- Enhanced README with new code examples for function calling and user message handling
19+
- Added Headers dictionary to VoiceLiveClientOptions to specify additional headers to set on connect.
20+
721
### Breaking Changes
822

23+
### Type Changes from Enums to Extensible Enums
24+
25+
Several enum types have been converted to extensible enums (struct-based) for better extensibility:
26+
27+
### AnimationOutputType
28+
- **Before**: `enum AnimationOutputType`
29+
- **After**: `readonly partial struct AnimationOutputType`
30+
- **Impact**: The type is now an extensible enum. Existing code using the enum values will continue to work due to implicit conversions.
31+
32+
### AudioNoiseReductionType
33+
- **Before**: `enum AudioNoiseReductionType`
34+
- **After**: `readonly partial struct AudioNoiseReductionType`
35+
- **Impact**: The type is now an extensible enum. Existing code using the enum values will continue to work due to implicit conversions.
36+
37+
### ItemParamStatus
38+
- **Before**: `enum ItemParamStatus`
39+
- **After**: `readonly partial struct ItemParamStatus`
40+
- **Impact**: The type is now an extensible enum. Existing code using the enum values will continue to work due to implicit conversions.
41+
42+
### ResponseCancelledDetailsReason
43+
- **Before**: `enum ResponseCancelledDetailsReason`
44+
- **After**: `readonly partial struct ResponseCancelledDetailsReason`
45+
- **Impact**: The type is now an extensible enum. Existing code using the enum values will continue to work due to implicit conversions.
46+
47+
### ResponseIncompleteDetailsReason
48+
- **Before**: `enum ResponseIncompleteDetailsReason`
49+
- **After**: `readonly partial struct ResponseIncompleteDetailsReason`
50+
- **Impact**: The type is now an extensible enum. Existing code using the enum values will continue to work due to implicit conversions.
51+
52+
## Class and Property Renames
53+
54+
### AudioInputTranscriptionSettings → AudioInputTranscriptionOptions
55+
- **Type renamed**: `AudioInputTranscriptionSettings` is now `AudioInputTranscriptionOptions`
56+
- **Model property renamed**: `AudioInputTranscriptionSettingsModel` is now `AudioInputTranscriptionOptionsModel`
57+
- **Impact**: Update all references to use the new type name.
58+
59+
### InputModality → InteractionModality
60+
- **Type renamed**: `InputModality` is now `InteractionModality`
61+
- **Impact**: Update all references from `InputModality` to `InteractionModality` throughout your code.
62+
63+
### ResponseMaxOutputTokensOption → MaxResponseOutputTokensOption
64+
- **Type renamed**: `ResponseMaxOutputTokensOption` is now `MaxResponseOutputTokensOption`
65+
- **Impact**: Update all type references to use the new name.
66+
67+
### Removed Types
68+
69+
### UserContentPart (abstract base class)
70+
- **Removed**: The abstract `UserContentPart` class has been removed.
71+
- **Replacement**: Use the new `MessageContentPart` abstract base class instead.
72+
- **Impact**: Update inheritance hierarchies to use `MessageContentPart`.
73+
74+
### AzureSemanticEnEouDetection
75+
- **Removed**: This class has been removed.
76+
- **Replacement**: Use `AzureSemanticEouDetectionEn` instead.
77+
78+
### AzureSemanticMultilingualEouDetection
79+
- **Removed**: This class has been removed.
80+
- **Replacement**: Use `AzureSemanticEouDetectionMultilingual` instead.
81+
82+
### AzureSemanticVadEnTurnDetection
83+
- **Removed**: This class has been removed.
84+
- **Replacement**: Use `AzureSemanticVadTurnDetectionEn` instead.
85+
86+
### AzureSemanticVadMultilingualTurnDetection
87+
- **Removed**: This class has been removed.
88+
- **Replacement**: Use `AzureSemanticVadTurnDetectionMultilingual` instead.
89+
90+
### Property Removals
91+
92+
### AnimationOptions
93+
- **Removed properties**:
94+
- `EmotionDetectionInterval`
95+
- `EmotionDetectionIntervalMs`
96+
- **Impact**: Remove any code that sets or reads these properties.
97+
98+
### AzureSemanticEouDetection family
99+
- **Changed property**: `Threshold` (float) has been replaced with `ThresholdLevel` using new threshold level types:
100+
- `AzureSemanticDetectionThresholdLevel` for `AzureSemanticEouDetection`
101+
- `AzureSemanticDetectionEnThresholdLevel` for `AzureSemanticEouDetectionEn`
102+
- `AzureSemanticDetectionMultilingualThresholdLevel` for `AzureSemanticEouDetectionMultilingual`
103+
- **Impact**: Update code to use the new `ThresholdLevel` property with appropriate enum values (Default, Low, Medium, High).
104+
105+
### Constructor and Method Signature Changes
106+
107+
### MessageItem
108+
- **Constructor changed**:
109+
- **Before**: `MessageItem(string role)`
110+
- **After**: `MessageItem(ResponseMessageRole role, IEnumerable<MessageContentPart> content)`
111+
- **Impact**: Update all MessageItem instantiations to provide both role and content parameters.
112+
113+
### AssistantMessageItem
114+
- **Constructor changed**:
115+
- **Before**: Accepted `OutputTextContentPart` or `IEnumerable<OutputTextContentPart>`
116+
- **After**: Accepts `MessageContentPart`, `IEnumerable<MessageContentPart>`, or a string
117+
- **Property changed**: `Content` is now `IList<MessageContentPart>` instead of `IList<OutputTextContentPart>`
118+
119+
### SystemMessageItem
120+
- **Constructor changed**:
121+
- **Before**: Accepted `InputTextContentPart` or `IEnumerable<InputTextContentPart>`
122+
- **After**: Accepts `InputTextContentPart`, `IEnumerable<MessageContentPart>`, or a string
123+
- **Property changed**: `Content` is now part of base `MessageItem` as `IList<MessageContentPart>`
124+
125+
### UserMessageItem
126+
- **Constructor changed**:
127+
- **Before**: Accepted `UserContentPart` or `IEnumerable<UserContentPart>`
128+
- **After**: Accepts `InputTextContentPart`, `IEnumerable<MessageContentPart>`, or a string
129+
- **Property changed**: `Content` is now part of base `MessageItem` as `IList<MessageContentPart>`
130+
131+
### ToolChoiceOption
132+
- **Constructor parameter renamed**:
133+
- **Before**: `ToolChoiceOption(string stringValue)`
134+
- **After**: `ToolChoiceOption(string functionName)`
135+
- **Impact**: The parameter name has changed, but functionality remains the same.
136+
137+
### Service Version Changes
138+
139+
### VoiceLiveClientOptions
140+
- **Default service version changed**:
141+
- **Before**: `ServiceVersion.V2025_05_01_Preview`
142+
- **After**: `ServiceVersion.V2025_10_01`
143+
- **Impact**: The client now defaults to a newer, non-preview API version.
144+
145+
### Class Inheritance Changes
146+
147+
### VoiceLiveClientOptions
148+
- **Before**: Inherited from `Azure.Core.ClientOptions`
149+
- **After**: No longer inherits from `ClientOptions`, but provides a `DiagnosticsOptions` property
150+
- **Impact**: Some properties previously available through inheritance may need to be accessed differently.
151+
152+
### Content Part Classes
153+
- `InputAudioContentPart`: Now inherits from `MessageContentPart` instead of `UserContentPart`
154+
- `InputTextContentPart`: Now inherits from `MessageContentPart` instead of `UserContentPart`
155+
- `OutputTextContentPart`: Now inherits from `MessageContentPart` instead of being standalone
156+
157+
### New Required Properties
158+
159+
### Turn Detection Classes
160+
Several turn detection classes have new properties that should be considered:
161+
- `CreateResponse` (bool?): Added to `ServerVadTurnDetection`, `AzureSemanticVadTurnDetection`, and related classes
162+
- `InterruptResponse` (bool?): Added to the same turn detection classes
163+
164+
### VideoParams
165+
- New optional properties:
166+
- `Background` (VideoBackground): Configure video background settings
167+
- `GopSize` (int?): Configure Group of Pictures size
168+
169+
### Property Access Changes
170+
171+
### VoiceLiveClient
172+
- **Removed**: `Pipeline` property is no longer publicly accessible
173+
- **Impact**: If you were accessing the HTTP pipeline directly, you'll need to find alternative approaches.
174+
175+
### VoiceLiveResponse
176+
- **Property changed**: `Modalities` is now `ModalitiesInternal` and returns `IList<InteractionModality>` instead of `SessionUpdateModality`
177+
178+
### Authentication Changes
179+
180+
### VoiceLiveClient Authentication Scope
181+
- **Authentication scope changed**: The default authentication scope has been updated from `https://cognitiveservices.azure.com/.default` to `https://ai.azure.com/.default`
182+
- **Impact**: This change should be transparent for most users, but custom authentication implementations may need adjustment.
183+
184+
### Class Inheritance and Interface Implementation Changes
185+
186+
### AzureSemanticEouDetectionEn and AzureSemanticEouDetectionMultilingual
187+
- **Before**: These classes were incomplete and did not properly inherit from base classes
188+
- **After**: Both classes now properly inherit from `EouDetection` and implement full serialization interfaces
189+
- **Impact**: These classes are now fully functional and consistent with the API pattern
190+
191+
### Migration Guide
192+
193+
1. **Update enum usage**: While the conversion to extensible enums maintains backward compatibility through implicit conversions, consider updating to use the new struct-based pattern for future-proofing.
194+
195+
2. **Rename types**: Find and replace all occurrences of renamed types:
196+
- `AudioInputTranscriptionSettings``AudioInputTranscriptionOptions`
197+
- `InputModality``InteractionModality`
198+
- `ResponseMaxOutputTokensOption``MaxResponseOutputTokensOption`
199+
200+
3. **Update inheritance**: Replace `UserContentPart` with `MessageContentPart` in any custom implementations.
201+
202+
4. **Update constructors**: Review and update all MessageItem-derived class instantiations to use the new constructor signatures.
203+
204+
5. **Update threshold properties**: Replace float `Threshold` properties with appropriate `ThresholdLevel` properties using the new enum values.
205+
206+
6. **Consider new properties**: Review the new `CreateResponse` and `InterruptResponse` properties in turn detection configurations to see if they benefit your use case.
207+
9208
### Bugs Fixed
10209

11210
### Other Changes
12211

212+
- Updated README examples to use new `InteractionModality` instead of `InputModality`
213+
- Updated default service version examples to use `V2025_10_01`
214+
- Enhanced documentation with additional code snippets for function response handling and user message creation
215+
13216
## 1.0.0-beta.2 (2025-09-22)
14217

15218
### Features Added

sdk/ai/Azure.AI.VoiceLive/README.md

Lines changed: 41 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -70,7 +70,7 @@ You have the flexibility to explicitly select a supported service API version wh
7070
```C# Snippet:CreateVoiceLiveClientForSpecificApiVersion
7171
Uri endpoint = new Uri("https://your-resource.cognitiveservices.azure.com");
7272
DefaultAzureCredential credential = new DefaultAzureCredential();
73-
VoiceLiveClientOptions options = new VoiceLiveClientOptions(VoiceLiveClientOptions.ServiceVersion.V2025_05_01_Preview);
73+
VoiceLiveClientOptions options = new VoiceLiveClientOptions(VoiceLiveClientOptions.ServiceVersion.V2025_10_01);
7474
VoiceLiveClient client = new VoiceLiveClient(endpoint, credential, options);
7575
```
7676

@@ -163,8 +163,8 @@ VoiceLiveSessionOptions sessionOptions = new()
163163

164164
// Ensure modalities include audio
165165
sessionOptions.Modalities.Clear();
166-
sessionOptions.Modalities.Add(InputModality.Text);
167-
sessionOptions.Modalities.Add(InputModality.Audio);
166+
sessionOptions.Modalities.Add(InteractionModality.Text);
167+
sessionOptions.Modalities.Add(InteractionModality.Audio);
168168

169169
await session.ConfigureSessionAsync(sessionOptions).ConfigureAwait(false);
170170

@@ -206,8 +206,8 @@ VoiceLiveSessionOptions sessionOptions = new()
206206

207207
// Ensure modalities include audio
208208
sessionOptions.Modalities.Clear();
209-
sessionOptions.Modalities.Add(InputModality.Text);
210-
sessionOptions.Modalities.Add(InputModality.Audio);
209+
sessionOptions.Modalities.Add(InteractionModality.Text);
210+
sessionOptions.Modalities.Add(InteractionModality.Audio);
211211

212212
await session.ConfigureSessionAsync(sessionOptions).ConfigureAwait(false);
213213
```
@@ -232,7 +232,6 @@ var getCurrentWeatherFunction = new VoiceLiveFunctionDefinition("get_current_wea
232232
}
233233
""")
234234
};
235-
236235
VoiceLiveSessionOptions sessionOptions = new()
237236
{
238237
Model = model,
@@ -247,11 +246,45 @@ sessionOptions.Tools.Add(getCurrentWeatherFunction);
247246

248247
// Ensure modalities include audio
249248
sessionOptions.Modalities.Clear();
250-
sessionOptions.Modalities.Add(InputModality.Text);
251-
sessionOptions.Modalities.Add(InputModality.Audio);
249+
sessionOptions.Modalities.Add(InteractionModality.Text);
250+
sessionOptions.Modalities.Add(InteractionModality.Audio);
252251

253252
await session.ConfigureSessionAsync(sessionOptions).ConfigureAwait(false);
254253
```
254+
### Function Response Handling
255+
```C# Snippet:FunctionCallResponseExample
256+
// Process events from the session
257+
await foreach (SessionUpdate serverEvent in session.GetUpdatesAsync().ConfigureAwait(false))
258+
{
259+
if (serverEvent is SessionUpdateResponseFunctionCallArgumentsDone functionCall)
260+
{
261+
if (functionCall.Name == "get_current_weather")
262+
{
263+
// Extract parameters from the function call
264+
var parametersString = functionCall.Arguments;
265+
var parameters = System.Text.Json.JsonSerializer.Deserialize<Dictionary<string, string>>(parametersString);
266+
267+
string location = parameters != null ? parameters["location"] : string.Empty;
268+
269+
// Call your external weather service here and get the result
270+
string weatherInfo = $"The current weather in {location} is sunny with a temperature of 75�F.";
271+
272+
// Send the function response back to the session
273+
await session.AddItemAsync(new FunctionCallOutputItem(functionCall.CallId, weatherInfo)).ConfigureAwait(false);
274+
275+
// Start the next response.
276+
await session.StartResponseAsync().ConfigureAwait(false);
277+
}
278+
}
279+
}
280+
```
281+
### Adding a user text message
282+
```C# Snippet:AddUserMessageExample
283+
// Add a user message to the session
284+
await session.AddItemAsync(new UserMessageItem("Hello, can you help me with my account?")).ConfigureAwait(false);
285+
// Start the response from the assistant
286+
await session.StartResponseAsync().ConfigureAwait(false);
287+
```
255288

256289
## Troubleshooting
257290

0 commit comments

Comments
 (0)