Align live transcription response type with OpenAI Realtime ConversationItem pattern#561
Merged
kunal-vaishnavi merged 3 commits intoruiren/audio-streaming-support-sdkfrom Mar 28, 2026
Conversation
added 2 commits
March 27, 2026 14:24
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
nenad1002
reviewed
Mar 27, 2026
|
|
||
| /// <summary>Start time offset of this segment in the audio stream (seconds).</summary> | ||
| [JsonPropertyName("start_time")] | ||
| public double? StartTime { get; init; } |
Contributor
Author
There was a problem hiding this comment.
Yes,
/// <summary>
/// Transcription result sent back to SDK via callback during streaming.
/// Must match the SDK's AudioStreamingTranscriptionResult type.
/// </summary>
public record AudioStreamingTranscriptionResult
{
[JsonPropertyName("is_final")]
public bool IsFinal { get; init; }
[JsonPropertyName("text")]
public string Text { get; init; } = string.Empty;
[JsonPropertyName("start_time")]
public double? StartTime { get; init; }
[JsonPropertyName("end_time")]
public double? EndTime { get; init; }
}
This is inside our Core code, as you can see, we have the start_time in JSON response.
It is useful for the caller to have timestamp display, subtitle generation etc.
kunal-vaishnavi
approved these changes
Mar 28, 2026
6869123
into
ruiren/audio-streaming-support-sdk
11 of 12 checks passed
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Redesigns
LiveAudioTranscriptionResponseto follow the OpenAI Realtime API'sConversationItemshape, enabling forward compatibility with a future WebSocket-based architecture.Motivation:
result.content[0].transcriptresult.Content[0].Textwon't need to change their code when we migrate to WebSocket transportBefore:
After:
Changes:
AudioCreateTranscriptionResponseinheritance. New standaloneLiveAudioTranscriptionResponsewithContentlist + newTranscriptionContentParttype.Text→.Content?[0]?.TextTranscriptionContentPart, removedAudioCreateTranscriptionResponse.Segmentresult.Content?[0]?.TextKey design decisions:
TranscriptionContentParthas bothTextandTranscript(set to the same value) for maximum compatibility with both Whisper and Realtime API patternsStartTime/EndTimeare top-level on the response (not nested in Segments) — simpler access, maps to Realtime'saudio_start_ms/audio_end_msConversationItem— we own the type to avoid carrying unused chat/tool-calling fieldsLiveAudioTranscriptionRaw(Core JSON deserialization) is unchanged — this is purely an SDK presentation change, no Core/neutron-server impactNo breaking changes to: Core API, native interop, audio pipeline, session lifecycle