Skip to content

Commit 2688759

Browse files
committed
Phase 4
1 parent 8488be1 commit 2688759

13 files changed

+2345
-0
lines changed
Lines changed: 195 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,195 @@
1+
# Azure.AI.VoiceLive SDK - Streaming and Processing Implementation
2+
3+
## Overview
4+
This implementation provides comprehensive streaming and processing functionality for the Azure.AI.VoiceLive SDK, following Azure SDK design patterns and based on the OpenAI Realtime API patterns.
5+
6+
## Components Implemented
7+
8+
### 1. Core Update Infrastructure
9+
10+
#### `VoiceLiveUpdateKind` (`/src/Updates/VoiceLiveUpdateKind.cs`)
11+
- Enumeration of all possible update types from the VoiceLive service
12+
- Maps server event types to client-side update kinds
13+
- Includes session, input audio, response, animation, and error events
14+
- Provides conversion methods from server event type strings
15+
16+
#### `VoiceLiveUpdate` (`/src/Updates/VoiceLiveUpdate.cs`)
17+
- Abstract base class for all updates received from the service
18+
- Implements `IJsonModel<VoiceLiveUpdate>` and `IPersistableModel<VoiceLiveUpdate>`
19+
- Provides serialization/deserialization support
20+
- Includes factory method to create updates from server events
21+
22+
### 2. Specific Update Types
23+
24+
#### `SessionStartedUpdate` (`/src/Updates/SessionStartedUpdate.cs`)
25+
- Represents session initialization completion
26+
- Provides access to session details (ID, configuration)
27+
- Created when the service confirms session establishment
28+
29+
#### `InputAudioUpdate` (`/src/Updates/InputAudioUpdate.cs`)
30+
- Handles all input audio-related events
31+
- Covers speech detection (start/stop), audio buffer management, transcription events
32+
- Provides typed access to audio timing, transcription text, errors
33+
- Boolean properties for easy event type checking
34+
35+
#### `OutputDeltaUpdate` (`/src/Updates/OutputDeltaUpdate.cs`)
36+
- Represents streaming/incremental content from the service
37+
- Handles text deltas, audio chunks, animation data, timestamps
38+
- Provides typed access to different content types (text, audio, animations)
39+
- Supports real-time content streaming scenarios
40+
41+
#### `OutputStreamingUpdate` (`/src/Updates/OutputStreamingUpdate.cs`)
42+
- Represents completion events and response lifecycle updates
43+
- Handles response start/completion, item creation/completion, content part events
44+
- Provides access to final content, usage statistics, response status
45+
- Boolean properties for different completion states
46+
47+
#### `ErrorUpdate` (`/src/Updates/ErrorUpdate.cs`)
48+
- Represents error conditions from the service
49+
- Provides detailed error information (type, code, message, parameters)
50+
- Includes helpful string representation for debugging
51+
- Maps service error events to client-side error objects
52+
53+
### 3. Factory Pattern
54+
55+
#### `VoiceLiveUpdateFactory` (`/src/VoiceLiveUpdateFactory.cs`)
56+
- Factory class for creating appropriate update instances from server events
57+
- Maps server event types to corresponding update classes
58+
- Handles JSON deserialization and type conversion
59+
- Supports both direct server event conversion and JSON element parsing
60+
- Includes generic update handling for unknown event types
61+
62+
### 4. Session Extension for Streaming
63+
64+
#### `VoiceLiveSession.Updates.cs` (`/src/VoiceLiveSession.Updates.cs`)
65+
- Partial class extension providing streaming functionality
66+
- Implements `IAsyncEnumerable<VoiceLiveUpdate>` pattern
67+
- Provides multiple convenience methods for filtered streaming
68+
69+
**Main Methods:**
70+
- `GetUpdatesAsync()` - Get all updates as async enumerable
71+
- `GetUpdates()` - Synchronous version for blocking scenarios
72+
- `GetUpdatesAsync<T>()` - Filter by specific update type
73+
- `GetUpdatesAsync(kinds...)` - Filter by update kinds
74+
- `WaitForUpdateAsync<T>()` - Wait for next update of specific type
75+
- `WaitForUpdateAsync(kind)` - Wait for next update of specific kind
76+
77+
**Convenience Methods:**
78+
- `GetDeltaUpdatesAsync()` - Only streaming content updates
79+
- `GetStreamingUpdatesAsync()` - Only completion/lifecycle updates
80+
- `GetInputAudioUpdatesAsync()` - Only input audio processing updates
81+
- `GetErrorUpdatesAsync()` - Only error updates
82+
83+
### 5. WebSocket Message Handling
84+
85+
#### Core Features:
86+
- **Fragmentation Handling**: Properly handles WebSocket message fragmentation using existing `AsyncVoiceLiveMessageCollectionResult`
87+
- **Thread Safety**: Uses existing locking mechanism to prevent multiple readers
88+
- **Message Processing**: Converts WebSocket messages to server events and then to updates
89+
- **Error Recovery**: Handles JSON parsing failures gracefully with unknown update types
90+
- **Connection Management**: Integrates with existing connection lifecycle
91+
92+
#### `AsyncEnumerableExtensions` (`/src/WebSocketHelpers/AsyncEnumerableExtensions.cs`)
93+
- Utility for converting `IAsyncEnumerable` to blocking enumerable
94+
- Supports synchronous usage scenarios
95+
- Proper cancellation and disposal handling
96+
97+
## Key Features
98+
99+
### 1. Comprehensive Event Coverage
100+
- Supports all VoiceLive server event types
101+
- Maps to appropriate client-side update classes
102+
- Handles both streaming (delta) and completion events
103+
104+
### 2. Type Safety
105+
- Strongly typed update classes with appropriate properties
106+
- Generic filtering methods for compile-time type safety
107+
- Boolean properties for easy event type checking
108+
109+
### 3. Flexible Consumption Patterns
110+
- Async enumerable for efficient streaming
111+
- Synchronous enumerable for blocking scenarios
112+
- Filtered streaming by type or kind
113+
- Wait methods for specific events
114+
- Convenience methods for common scenarios
115+
116+
### 4. WebSocket Integration
117+
- Builds on existing WebSocket infrastructure
118+
- Handles message fragmentation automatically
119+
- Thread-safe message processing
120+
- Proper connection state management
121+
122+
### 5. Error Handling
123+
- Comprehensive error update support
124+
- Graceful handling of parsing failures
125+
- Proper exception propagation
126+
- Unknown event type handling
127+
128+
### 6. Azure SDK Compliance
129+
- Follows Azure SDK design guidelines
130+
- Implements required interfaces (`IJsonModel`, `IPersistableModel`)
131+
- Uses Azure SDK naming conventions
132+
- Integrates with existing patterns
133+
134+
## Usage Patterns
135+
136+
### Basic Streaming
137+
```csharp
138+
await foreach (VoiceLiveUpdate update in session.GetUpdatesAsync())
139+
{
140+
switch (update)
141+
{
142+
case OutputDeltaUpdate delta when delta.IsTextDelta:
143+
Console.Write(delta.TextDelta);
144+
break;
145+
case ErrorUpdate error:
146+
Console.WriteLine($"Error: {error.ErrorMessage}");
147+
break;
148+
}
149+
}
150+
```
151+
152+
### Filtered Streaming
153+
```csharp
154+
await foreach (OutputDeltaUpdate delta in session.GetUpdatesAsync<OutputDeltaUpdate>())
155+
{
156+
ProcessDelta(delta);
157+
}
158+
```
159+
160+
### Wait for Specific Events
161+
```csharp
162+
SessionStartedUpdate started = await session.WaitForUpdateAsync<SessionStartedUpdate>();
163+
Console.WriteLine($"Session {started.SessionId} ready");
164+
```
165+
166+
## Implementation Quality
167+
168+
### Strengths
169+
1. **Complete Implementation**: Covers all major VoiceLive event types
170+
2. **Type Safety**: Strong typing with appropriate inheritance hierarchy
171+
3. **Flexible API**: Multiple consumption patterns for different scenarios
172+
4. **Integration**: Builds on existing WebSocket infrastructure
173+
5. **Error Handling**: Comprehensive error scenarios covered
174+
6. **Documentation**: Extensive inline documentation and usage examples
175+
176+
### Architecture Benefits
177+
1. **Extensible**: Easy to add new update types as service evolves
178+
2. **Performant**: Efficient streaming with minimal allocations
179+
3. **Testable**: Clean separation of concerns enables thorough testing
180+
4. **Maintainable**: Clear code organization and consistent patterns
181+
182+
## Files Created
183+
- `/src/Updates/VoiceLiveUpdateKind.cs` - Update type enumeration
184+
- `/src/Updates/VoiceLiveUpdate.cs` - Base update class
185+
- `/src/Updates/SessionStartedUpdate.cs` - Session events
186+
- `/src/Updates/InputAudioUpdate.cs` - Input audio processing
187+
- `/src/Updates/OutputDeltaUpdate.cs` - Streaming content
188+
- `/src/Updates/OutputStreamingUpdate.cs` - Completion events
189+
- `/src/Updates/ErrorUpdate.cs` - Error handling
190+
- `/src/VoiceLiveUpdateFactory.cs` - Factory pattern implementation
191+
- `/src/VoiceLiveSession.Updates.cs` - Session streaming extension
192+
- `/src/WebSocketHelpers/AsyncEnumerableExtensions.cs` - Utility methods
193+
- `/src/STREAMING_USAGE_EXAMPLES.cs` - Comprehensive usage examples
194+
195+
This implementation provides a complete, production-ready streaming and processing system for the Azure.AI.VoiceLive SDK that follows best practices and integrates seamlessly with the existing codebase.
Lines changed: 192 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,192 @@
1+
// Copyright (c) Microsoft Corporation. All rights reserved.
2+
// Licensed under the MIT License.
3+
4+
/*
5+
# VoiceLive SDK Streaming Updates Usage Examples
6+
7+
This file demonstrates how to use the VoiceLive SDK's streaming update functionality.
8+
9+
## Basic Update Streaming
10+
11+
```csharp
12+
using Azure.AI.VoiceLive;
13+
14+
// Create and connect to a VoiceLive session
15+
var client = new VoiceLiveClient(endpoint, credential);
16+
var session = await client.CreateSessionAsync(sessionOptions);
17+
18+
// Get all updates as they arrive
19+
await foreach (VoiceLiveUpdate update in session.GetUpdatesAsync())
20+
{
21+
Console.WriteLine($"Received update: {update.Kind}");
22+
23+
// Handle specific update types
24+
switch (update)
25+
{
26+
case SessionStartedUpdate sessionStarted:
27+
Console.WriteLine($"Session started: {sessionStarted.SessionId}");
28+
break;
29+
30+
case OutputDeltaUpdate deltaUpdate:
31+
if (deltaUpdate.IsTextDelta)
32+
{
33+
Console.Write(deltaUpdate.TextDelta); // Stream text as it arrives
34+
}
35+
else if (deltaUpdate.IsAudioDelta)
36+
{
37+
ProcessAudioData(deltaUpdate.AudioDelta); // Process audio chunks
38+
}
39+
break;
40+
41+
case InputAudioUpdate inputUpdate:
42+
if (inputUpdate.IsSpeechStarted)
43+
{
44+
Console.WriteLine("User started speaking");
45+
}
46+
else if (inputUpdate.IsTranscriptionDelta)
47+
{
48+
Console.WriteLine($"Transcription delta: {inputUpdate.TranscriptionDelta}");
49+
}
50+
break;
51+
52+
case ErrorUpdate errorUpdate:
53+
Console.WriteLine($"Error: {errorUpdate.ErrorMessage}");
54+
break;
55+
}
56+
}
57+
```
58+
59+
## Filtered Update Streaming
60+
61+
```csharp
62+
// Get only specific types of updates
63+
await foreach (OutputDeltaUpdate delta in session.GetUpdatesAsync<OutputDeltaUpdate>())
64+
{
65+
if (delta.IsTextDelta)
66+
{
67+
Console.Write(delta.TextDelta);
68+
}
69+
}
70+
71+
// Get updates of specific kinds
72+
await foreach (VoiceLiveUpdate update in session.GetUpdatesAsync(
73+
cancellationToken,
74+
VoiceLiveUpdateKind.ResponseTextDelta,
75+
VoiceLiveUpdateKind.ResponseAudioDelta))
76+
{
77+
// Process only text and audio deltas
78+
}
79+
```
80+
81+
## Synchronous Usage
82+
83+
```csharp
84+
// For scenarios where you need synchronous processing
85+
foreach (VoiceLiveUpdate update in session.GetUpdates())
86+
{
87+
ProcessUpdate(update);
88+
}
89+
```
90+
91+
## Convenience Methods
92+
93+
```csharp
94+
// Wait for a specific update type
95+
SessionStartedUpdate sessionStarted = await session.WaitForUpdateAsync<SessionStartedUpdate>();
96+
Console.WriteLine($"Session {sessionStarted.SessionId} is ready");
97+
98+
// Wait for a specific update kind
99+
VoiceLiveUpdate errorUpdate = await session.WaitForUpdateAsync(VoiceLiveUpdateKind.Error);
100+
101+
// Get only delta updates (streaming content)
102+
await foreach (OutputDeltaUpdate delta in session.GetDeltaUpdatesAsync())
103+
{
104+
ProcessDelta(delta);
105+
}
106+
107+
// Get only streaming updates (completion events)
108+
await foreach (OutputStreamingUpdate streaming in session.GetStreamingUpdatesAsync())
109+
{
110+
ProcessStreamingUpdate(streaming);
111+
}
112+
113+
// Get only input audio updates
114+
await foreach (InputAudioUpdate inputAudio in session.GetInputAudioUpdatesAsync())
115+
{
116+
ProcessInputAudio(inputAudio);
117+
}
118+
119+
// Get only error updates
120+
await foreach (ErrorUpdate error in session.GetErrorUpdatesAsync())
121+
{
122+
HandleError(error);
123+
}
124+
```
125+
126+
## WebSocket Message Handling
127+
128+
The SDK automatically handles:
129+
- WebSocket message fragmentation and reassembly
130+
- JSON deserialization of server events
131+
- Conversion from server events to typed update objects
132+
- Connection lifecycle management
133+
- Thread-safe message processing
134+
135+
## Update Types
136+
137+
### VoiceLiveUpdateKind Enumeration
138+
139+
- **Session Events**: SessionStarted, SessionUpdated, SessionAvatarConnecting
140+
- **Input Audio Events**: InputAudioBufferCommitted, InputAudioBufferCleared, InputAudioSpeechStarted, InputAudioSpeechStopped
141+
- **Input Transcription Events**: InputAudioTranscriptionCompleted, InputAudioTranscriptionDelta, InputAudioTranscriptionFailed
142+
- **Response Events**: ResponseStarted, ResponseCompleted
143+
- **Response Streaming Events**: ResponseOutputItemAdded, ResponseOutputItemDone, ResponseContentPartAdded, ResponseContentPartDone
144+
- **Response Delta Events**: ResponseTextDelta, ResponseAudioDelta, ResponseAudioTranscriptDelta
145+
- **Animation Events**: ResponseAnimationBlendshapesDelta, ResponseAnimationVisemeDelta, ResponseAudioTimestampDelta
146+
- **Error Events**: Error
147+
148+
### Update Classes
149+
150+
- **VoiceLiveUpdate**: Base class for all updates
151+
- **SessionStartedUpdate**: Session initialization complete
152+
- **InputAudioUpdate**: Input audio processing events (speech detection, transcription)
153+
- **OutputDeltaUpdate**: Streaming content updates (text, audio, animations)
154+
- **OutputStreamingUpdate**: Completion events and response lifecycle
155+
- **ErrorUpdate**: Error conditions and failures
156+
157+
## Error Handling
158+
159+
```csharp
160+
try
161+
{
162+
await foreach (VoiceLiveUpdate update in session.GetUpdatesAsync())
163+
{
164+
if (update is ErrorUpdate errorUpdate)
165+
{
166+
Console.WriteLine($"Service error: {errorUpdate.ErrorMessage}");
167+
// Handle the error appropriately
168+
break;
169+
}
170+
// Process other updates
171+
}
172+
}
173+
catch (OperationCanceledException)
174+
{
175+
Console.WriteLine("Update streaming was cancelled");
176+
}
177+
catch (Exception ex)
178+
{
179+
Console.WriteLine($"Unexpected error: {ex.Message}");
180+
}
181+
```
182+
183+
## Best Practices
184+
185+
1. **Use appropriate update filtering** to reduce processing overhead
186+
2. **Handle cancellation** properly with CancellationToken
187+
3. **Process delta updates quickly** to avoid buffer overrun
188+
4. **Monitor for error updates** to handle service issues
189+
5. **Use async enumeration** for better resource utilization
190+
6. **Implement proper cleanup** when done with the session
191+
192+
*/

0 commit comments

Comments
 (0)