Align live transcription response type with OpenAI Realtime ConversationItem pattern by rui-ren · Pull Request #561 · microsoft/Foundry-Local

rui-ren · 2026-03-27T22:29:47Z

Description

Redesigns LiveAudioTranscriptionResponse to follow the OpenAI Realtime API's ConversationItem shape, enabling forward compatibility with a future WebSocket-based architecture.

Motivation:

Customers using OpenAI's Realtime API access transcription via result.content[0].transcript
By adopting this pattern now, customers who write result.Content[0].Text won't need to change their code when we migrate to WebSocket transport
Aligns with the team's plan to move toward OpenAI Realtime API compatibility

Before:

// Extended AudioCreateTranscriptionResponse from Betalgo
await foreach (var result in session.GetTranscriptionStream())
{
    Console.Write(result.Text);           // inherited from base
    bool final = result.IsFinal;          // custom field
    var segments = result.Segments;       // inherited from base
}

After:

// Own type shaped like OpenAI Realtime ConversationItem
await foreach (var result in session.GetTranscriptionStream())
{
    Console.Write(result.Content[0].Text);       // ConversationItem pattern
    Console.Write(result.Content[0].Transcript); // alias for Text (Realtime compat)
    bool final = result.IsFinal;
    double? start = result.StartTime;
}

Changes:

File	Change
LiveAudioTranscriptionTypes.cs	Removed `AudioCreateTranscriptionResponse` inheritance. New standalone `LiveAudioTranscriptionResponse` with `Content` list + new `TranscriptionContentPart` type
LiveAudioTranscriptionClient.cs	Updated text checks: `.Text` → `.Content?[0]?.Text`
JsonSerializationContext.cs	Registered `TranscriptionContentPart`, removed `AudioCreateTranscriptionResponse.Segment`
LiveAudioTranscriptionTests.cs	Updated assertions to match new type shape
Program.cs (sample)	Updated result reading to `result.Content?[0]?.Text`
README.md	Updated docs and output type table

Key design decisions:

TranscriptionContentPart has both Text and Transcript (set to the same value) for maximum compatibility with both Whisper and Realtime API patterns
StartTime/EndTime are top-level on the response (not nested in Segments) — simpler access, maps to Realtime's audio_start_ms/audio_end_ms
No dependency on Betalgo's ConversationItem — we own the type to avoid carrying unused chat/tool-calling fields
LiveAudioTranscriptionRaw (Core JSON deserialization) is unchanged — this is purely an SDK presentation change, no Core/neutron-server impact

No breaking changes to: Core API, native interop, audio pipeline, session lifecycle

vercel · 2026-03-27T22:29:52Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
foundry-local	Error		Mar 28, 2026 1:34am

nenad1002 · 2026-03-27T22:36:46Z

sdk/cs/src/OpenAI/LiveAudioTranscriptionTypes.cs

+
+    /// <summary>Start time offset of this segment in the audio stream (seconds).</summary>
+    [JsonPropertyName("start_time")]
+    public double? StartTime { get; init; }


We're tracking this?

Yes,

/// <summary> /// Transcription result sent back to SDK via callback during streaming. /// Must match the SDK's AudioStreamingTranscriptionResult type. /// </summary> public record AudioStreamingTranscriptionResult { [JsonPropertyName("is_final")] public bool IsFinal { get; init; } [JsonPropertyName("text")] public string Text { get; init; } = string.Empty; [JsonPropertyName("start_time")] public double? StartTime { get; init; } [JsonPropertyName("end_time")] public double? EndTime { get; init; } }

This is inside our Core code, as you can see, we have the start_time in JSON response.

It is useful for the caller to have timestamp display, subtitle generation etc.

ruiren_microsoft added 2 commits March 27, 2026 14:24

comments

2f8e762

source

5b01b9e

rui-ren requested review from hanbitmyths, jiafatom, kunal-vaishnavi and nenad1002 March 27, 2026 22:30

nenad1002 reviewed Mar 27, 2026

View reviewed changes

inheirte bentalgo

503713c

vercel bot had a problem deploying to Preview March 28, 2026 01:34 Failure

kunal-vaishnavi approved these changes Mar 28, 2026

View reviewed changes

kunal-vaishnavi merged commit 6869123 into ruiren/audio-streaming-support-sdk Mar 28, 2026
11 of 12 checks passed

kunal-vaishnavi deleted the ruiren/audio-streaming-support-realtime-response branch March 28, 2026 03:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Align live transcription response type with OpenAI Realtime ConversationItem pattern#561

Align live transcription response type with OpenAI Realtime ConversationItem pattern#561
kunal-vaishnavi merged 3 commits intoruiren/audio-streaming-support-sdkfrom
ruiren/audio-streaming-support-realtime-response

rui-ren commented Mar 27, 2026

Uh oh!

vercel bot commented Mar 27, 2026 •

edited

Loading

Uh oh!

nenad1002 Mar 27, 2026

Uh oh!

rui-ren Mar 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

rui-ren commented Mar 27, 2026

Description

Uh oh!

vercel bot commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nenad1002 Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

rui-ren Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

vercel bot commented Mar 27, 2026 •

edited

Loading