Skip to content

Conversation

@seratch
Copy link
Member

@seratch seratch commented Jun 12, 2025

  • inputAudioTranscription
  • turnDetection

@changeset-bot
Copy link

changeset-bot bot commented Jun 12, 2025

🦋 Changeset detected

Latest commit: 7d248d3

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 3 packages
Name Type
@openai/agents-realtime Patch
@openai/agents Patch
@openai/agents-extensions Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

turnDetection: {
type: 'semantic_vad',
eagerness: 'medium',
create_response: true,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed the example but either works!


export type RealtimeInputAudioTranscriptionConfig = {
language?: string;
model?: 'gpt-4o-transcribe' | 'gpt-4o-mini-transcribe' | 'whisper-1' | string;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

allowed this property to pass anything else as we may release new models in the future (plus alpha users may use different ones)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should do string & {} for better type autocomplete

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dkundel-openai thanks, updated!

return {
type: c.type,
create_response:
'createResponse' in c
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if snake_case then camelCase is preferred, happy to change this order

@seratch seratch force-pushed the voice-agent-config-types branch from ed4b4de to f86d0fc Compare June 13, 2025 06:35
@seratch seratch changed the title Improve the types of RealtimeAgent configuration Improve the types of RealtimeSession configuration Jun 13, 2025

// The Realtime API accepts snake_cased keys, so when using this, this SDK coverts the keys to snake_case ones before passing it to the API
export type RealtimeTurnDetectionConfigCamelCase = {
type?: 'semantic_vad';
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Type can also be server_vad so we should support both.

prefixPaddingMs?: number;
silenceDurationMs?: number;
threshold?: number;
};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make it acceptable for this to also still take other properties inside of these two settings? Thinking how theoretically you could roll your own Realtime Transport Layer right now with other session config. But also fine to guide people to providerData for that and override this entire property

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, updated

Copy link
Collaborator

@dkundel-openai dkundel-openai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See open comments

@seratch seratch force-pushed the voice-agent-config-types branch from f86d0fc to 7d248d3 Compare June 17, 2025 00:58
prefixPaddingMs?: number;
silenceDurationMs?: number;
threshold?: number;
};
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, updated

threshold,
...rest,
};
// Remove undefined values from the config
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I verified the behavior, having undefined values could affect the connection establishment, so I added this logic. but if my observation is wrong or is missing something, please feel free to adjust this part.

const item = event.response.output[event.response.output.length - 1];
const textOutput = getLastTextFromAudioOutputMessage(item) ?? '';
const itemId = item.id ?? '';
const itemId = item?.id ?? '';
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an unrelated existing bug i found while doing tests

@dkundel-openai dkundel-openai merged commit 49bfe25 into main Jun 18, 2025
5 checks passed
@dkundel-openai dkundel-openai deleted the voice-agent-config-types branch June 18, 2025 04:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants