Skip to content

Commit a14660f

Browse files
chore(api): Minor docs and type updates for realtime
1 parent d934689 commit a14660f

27 files changed

+2448
-1767
lines changed

.stats.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
11
configured_endpoints: 118
2-
openapi_spec_url: https://storage.googleapis.com/stainless-sdk-openapi-specs/openai%2Fopenai-16cb18bed32bae8c5840fb39a1bf664026cc40463ad0c487dcb0df1bd3d72db0.yml
3-
openapi_spec_hash: 4cb51b22f98dee1a90bc7add82d1d132
2+
openapi_spec_url: https://storage.googleapis.com/stainless-sdk-openapi-specs/openai%2Fopenai-c829f9e7f51d4946dae7b02eb37eb857b538a464cf54c7ced5eff1b1c93e07db.yml
3+
openapi_spec_hash: 1b2eaba46b264bcec8831bc496543649
44
config_hash: 930dac3aa861344867e4ac84f037b5df

lib/openai/models/realtime/input_audio_buffer_timeout_triggered.rb

Lines changed: 25 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -5,13 +5,15 @@ module Models
55
module Realtime
66
class InputAudioBufferTimeoutTriggered < OpenAI::Internal::Type::BaseModel
77
# @!attribute audio_end_ms
8-
# Millisecond offset where speech ended within the buffered audio.
8+
# Millisecond offset of audio written to the input audio buffer at the time the
9+
# timeout was triggered.
910
#
1011
# @return [Integer]
1112
required :audio_end_ms, Integer
1213

1314
# @!attribute audio_start_ms
14-
# Millisecond offset where speech started within the buffered audio.
15+
# Millisecond offset of audio written to the input audio buffer that was after the
16+
# playback time of the last model response.
1517
#
1618
# @return [Integer]
1719
required :audio_start_ms, Integer
@@ -35,11 +37,29 @@ class InputAudioBufferTimeoutTriggered < OpenAI::Internal::Type::BaseModel
3537
required :type, const: :"input_audio_buffer.timeout_triggered"
3638

3739
# @!method initialize(audio_end_ms:, audio_start_ms:, event_id:, item_id:, type: :"input_audio_buffer.timeout_triggered")
38-
# Returned when the server VAD timeout is triggered for the input audio buffer.
40+
# Some parameter documentations has been truncated, see
41+
# {OpenAI::Models::Realtime::InputAudioBufferTimeoutTriggered} for more details.
3942
#
40-
# @param audio_end_ms [Integer] Millisecond offset where speech ended within the buffered audio.
43+
# Returned when the Server VAD timeout is triggered for the input audio buffer.
44+
# This is configured with `idle_timeout_ms` in the `turn_detection` settings of
45+
# the session, and it indicates that there hasn't been any speech detected for the
46+
# configured duration.
4147
#
42-
# @param audio_start_ms [Integer] Millisecond offset where speech started within the buffered audio.
48+
# The `audio_start_ms` and `audio_end_ms` fields indicate the segment of audio
49+
# after the last model response up to the triggering time, as an offset from the
50+
# beginning of audio written to the input audio buffer. This means it demarcates
51+
# the segment of audio that was silent and the difference between the start and
52+
# end values will roughly match the configured timeout.
53+
#
54+
# The empty audio will be committed to the conversation as an `input_audio` item
55+
# (there will be a `input_audio_buffer.committed` event) and a model response will
56+
# be generated. There may be speech that didn't trigger VAD but is still detected
57+
# by the model, so the model may respond with something relevant to the
58+
# conversation or a prompt to continue speaking.
59+
#
60+
# @param audio_end_ms [Integer] Millisecond offset of audio written to the input audio buffer at the time the ti
61+
#
62+
# @param audio_start_ms [Integer] Millisecond offset of audio written to the input audio buffer that was after the
4363
#
4464
# @param event_id [String] The unique ID of the server event.
4565
#

lib/openai/models/realtime/realtime_audio_config_input.rb

Lines changed: 14 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -36,17 +36,20 @@ class RealtimeAudioConfigInput < OpenAI::Internal::Type::BaseModel
3636
# @!attribute turn_detection
3737
# Configuration for turn detection, ether Server VAD or Semantic VAD. This can be
3838
# set to `null` to turn off, in which case the client must manually trigger model
39-
# response. Server VAD means that the model will detect the start and end of
40-
# speech based on audio volume and respond at the end of user speech. Semantic VAD
41-
# is more advanced and uses a turn detection model (in conjunction with VAD) to
42-
# semantically estimate whether the user has finished speaking, then dynamically
43-
# sets a timeout based on this probability. For example, if user audio trails off
44-
# with "uhhm", the model will score a low probability of turn end and wait longer
45-
# for the user to continue speaking. This can be useful for more natural
46-
# conversations, but may have a higher latency.
39+
# response.
4740
#
48-
# @return [OpenAI::Models::Realtime::RealtimeAudioInputTurnDetection, nil]
49-
optional :turn_detection, -> { OpenAI::Realtime::RealtimeAudioInputTurnDetection }
41+
# Server VAD means that the model will detect the start and end of speech based on
42+
# audio volume and respond at the end of user speech.
43+
#
44+
# Semantic VAD is more advanced and uses a turn detection model (in conjunction
45+
# with VAD) to semantically estimate whether the user has finished speaking, then
46+
# dynamically sets a timeout based on this probability. For example, if user audio
47+
# trails off with "uhhm", the model will score a low probability of turn end and
48+
# wait longer for the user to continue speaking. This can be useful for more
49+
# natural conversations, but may have a higher latency.
50+
#
51+
# @return [OpenAI::Models::Realtime::RealtimeAudioInputTurnDetection::ServerVad, OpenAI::Models::Realtime::RealtimeAudioInputTurnDetection::SemanticVad, nil]
52+
optional :turn_detection, union: -> { OpenAI::Realtime::RealtimeAudioInputTurnDetection }, nil?: true
5053

5154
# @!method initialize(format_: nil, noise_reduction: nil, transcription: nil, turn_detection: nil)
5255
# Some parameter documentations has been truncated, see
@@ -58,7 +61,7 @@ class RealtimeAudioConfigInput < OpenAI::Internal::Type::BaseModel
5861
#
5962
# @param transcription [OpenAI::Models::Realtime::AudioTranscription] Configuration for input audio transcription, defaults to off and can be set to `
6063
#
61-
# @param turn_detection [OpenAI::Models::Realtime::RealtimeAudioInputTurnDetection] Configuration for turn detection, ether Server VAD or Semantic VAD. This can be
64+
# @param turn_detection [OpenAI::Models::Realtime::RealtimeAudioInputTurnDetection::ServerVad, OpenAI::Models::Realtime::RealtimeAudioInputTurnDetection::SemanticVad, nil] Configuration for turn detection, ether Server VAD or Semantic VAD. This can be
6265

6366
# @see OpenAI::Models::Realtime::RealtimeAudioConfigInput#noise_reduction
6467
class NoiseReduction < OpenAI::Internal::Type::BaseModel

0 commit comments

Comments
 (0)