Skip to content

Conversation

@rlundeen2
Copy link
Contributor

@rlundeen2 rlundeen2 commented Jan 10, 2026

Title says it all! Supporting audio for gpt-audio and also tool calls.

Tests:

  • Added unit tests and integration tests
  • All integration tests running



@dataclass
class OpenAICompletionsAudioConfig:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

optional NIT: do we want to consider renaming to OpenAIChatAudioConfig to correspond to our OpenAIChatTarget and make it clear that it's not OpenAICompletionTarget

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree, this is neither "nit" nor optional 😆


# Voices supported by OpenAI Chat Completions API audio output.
# See: https://platform.openai.com/docs/guides/text-to-speech#voice-options
CompletionsAudioVoice = Literal["alloy", "ash", "ballad", "coral", "echo", "sage", "shimmer", "verse", "marin", "cedar"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious why this isn't exactly the same as the list on the platform.openai.com webpage that's linked above? (missing fable, nova, onyx)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add them!

extension=extension,
)

if audio_format == "pcm16":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be missing something, but is there a unit test for pcm16 specifically?


# Skip audio for assistant messages - OpenAI only allows audio in user messages.
# For assistant responses, the transcript text piece should already be included.
if role == "assistant":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is assistant the only other option besides user ?

if not pieces:
raise EmptyResponseException(message="Failed to extract any response content.")

return Message(message_pieces=pieces)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so right now you wouldn't be able to tell what is the transcript text and what is just text content, so hypothetically, there could be no transcipt but there could be text content and there would be no distinction. Do you see a distinction being useful (I'm not sure whether the content / value of text content vs transcript makes it obvious which is which so that being more explicit is unnecessary) ?

content.append(entry)
elif message_piece.converted_value_data_type == "audio_path":
ext = DataTypeSerializer.get_extension(message_piece.converted_value)
if not ext or ext.lower() not in [".wav", ".mp3"]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://platform.openai.com/docs/guides/speech-to-text says "mp3, mp4, mpeg, mpga, m4a, wav, and webm" so is this just that pyrit + openai chat completions only supports .wav & .mp3 ? because then we should maybe more exact

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants