Skip to content

Align live transcription response type with OpenAI Realtime ConversationItem pattern#561

Merged
kunal-vaishnavi merged 3 commits intoruiren/audio-streaming-support-sdkfrom
ruiren/audio-streaming-support-realtime-response
Mar 28, 2026
Merged

Align live transcription response type with OpenAI Realtime ConversationItem pattern#561
kunal-vaishnavi merged 3 commits intoruiren/audio-streaming-support-sdkfrom
ruiren/audio-streaming-support-realtime-response

Conversation

@rui-ren
Copy link
Copy Markdown
Contributor

@rui-ren rui-ren commented Mar 27, 2026

Description

Redesigns LiveAudioTranscriptionResponse to follow the OpenAI Realtime API's ConversationItem shape, enabling forward compatibility with a future WebSocket-based architecture.

Motivation:

  • Customers using OpenAI's Realtime API access transcription via result.content[0].transcript
  • By adopting this pattern now, customers who write result.Content[0].Text won't need to change their code when we migrate to WebSocket transport
  • Aligns with the team's plan to move toward OpenAI Realtime API compatibility

Before:

// Extended AudioCreateTranscriptionResponse from Betalgo
await foreach (var result in session.GetTranscriptionStream())
{
    Console.Write(result.Text);           // inherited from base
    bool final = result.IsFinal;          // custom field
    var segments = result.Segments;       // inherited from base
}

After:

// Own type shaped like OpenAI Realtime ConversationItem
await foreach (var result in session.GetTranscriptionStream())
{
    Console.Write(result.Content[0].Text);       // ConversationItem pattern
    Console.Write(result.Content[0].Transcript); // alias for Text (Realtime compat)
    bool final = result.IsFinal;
    double? start = result.StartTime;
}

Changes:

File Change
LiveAudioTranscriptionTypes.cs Removed AudioCreateTranscriptionResponse inheritance. New standalone LiveAudioTranscriptionResponse with Content list + new TranscriptionContentPart type
LiveAudioTranscriptionClient.cs Updated text checks: .Text.Content?[0]?.Text
JsonSerializationContext.cs Registered TranscriptionContentPart, removed AudioCreateTranscriptionResponse.Segment
LiveAudioTranscriptionTests.cs Updated assertions to match new type shape
Program.cs (sample) Updated result reading to result.Content?[0]?.Text
README.md Updated docs and output type table

Key design decisions:

  • TranscriptionContentPart has both Text and Transcript (set to the same value) for maximum compatibility with both Whisper and Realtime API patterns
  • StartTime/EndTime are top-level on the response (not nested in Segments) — simpler access, maps to Realtime's audio_start_ms/audio_end_ms
  • No dependency on Betalgo's ConversationItem — we own the type to avoid carrying unused chat/tool-calling fields
  • LiveAudioTranscriptionRaw (Core JSON deserialization) is unchanged — this is purely an SDK presentation change, no Core/neutron-server impact

No breaking changes to: Core API, native interop, audio pipeline, session lifecycle

ruiren_microsoft added 2 commits March 27, 2026 14:24
@vercel
Copy link
Copy Markdown

vercel bot commented Mar 27, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
foundry-local Error Error Mar 28, 2026 1:34am

Request Review


/// <summary>Start time offset of this segment in the audio stream (seconds).</summary>
[JsonPropertyName("start_time")]
public double? StartTime { get; init; }
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're tracking this?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes,

/// <summary>
/// Transcription result sent back to SDK via callback during streaming.
/// Must match the SDK's AudioStreamingTranscriptionResult type.
/// </summary>
public record AudioStreamingTranscriptionResult
{
    [JsonPropertyName("is_final")]
    public bool IsFinal { get; init; }

    [JsonPropertyName("text")]
    public string Text { get; init; } = string.Empty;

    [JsonPropertyName("start_time")]
    public double? StartTime { get; init; }

    [JsonPropertyName("end_time")]
    public double? EndTime { get; init; }
}

This is inside our Core code, as you can see, we have the start_time in JSON response.

It is useful for the caller to have timestamp display, subtitle generation etc.

@kunal-vaishnavi kunal-vaishnavi merged commit 6869123 into ruiren/audio-streaming-support-sdk Mar 28, 2026
11 of 12 checks passed
@kunal-vaishnavi kunal-vaishnavi deleted the ruiren/audio-streaming-support-realtime-response branch March 28, 2026 03:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants