-
Notifications
You must be signed in to change notification settings - Fork 642
FEAT: Adding audio and tool support to chat completions #1311
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
|
||
|
|
||
| @dataclass | ||
| class OpenAICompletionsAudioConfig: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
optional NIT: do we want to consider renaming to OpenAIChatAudioConfig to correspond to our OpenAIChatTarget and make it clear that it's not OpenAICompletionTarget
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I disagree, this is neither "nit" nor optional 😆
|
|
||
| # Voices supported by OpenAI Chat Completions API audio output. | ||
| # See: https://platform.openai.com/docs/guides/text-to-speech#voice-options | ||
| CompletionsAudioVoice = Literal["alloy", "ash", "ballad", "coral", "echo", "sage", "shimmer", "verse", "marin", "cedar"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
curious why this isn't exactly the same as the list on the platform.openai.com webpage that's linked above? (missing fable, nova, onyx)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add them!
| extension=extension, | ||
| ) | ||
|
|
||
| if audio_format == "pcm16": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might be missing something, but is there a unit test for pcm16 specifically?
|
|
||
| # Skip audio for assistant messages - OpenAI only allows audio in user messages. | ||
| # For assistant responses, the transcript text piece should already be included. | ||
| if role == "assistant": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is assistant the only other option besides user ?
| if not pieces: | ||
| raise EmptyResponseException(message="Failed to extract any response content.") | ||
|
|
||
| return Message(message_pieces=pieces) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so right now you wouldn't be able to tell what is the transcript text and what is just text content, so hypothetically, there could be no transcipt but there could be text content and there would be no distinction. Do you see a distinction being useful (I'm not sure whether the content / value of text content vs transcript makes it obvious which is which so that being more explicit is unnecessary) ?
| content.append(entry) | ||
| elif message_piece.converted_value_data_type == "audio_path": | ||
| ext = DataTypeSerializer.get_extension(message_piece.converted_value) | ||
| if not ext or ext.lower() not in [".wav", ".mp3"]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://platform.openai.com/docs/guides/speech-to-text says "mp3, mp4, mpeg, mpga, m4a, wav, and webm" so is this just that pyrit + openai chat completions only supports .wav & .mp3 ? because then we should maybe more exact
Title says it all! Supporting audio for gpt-audio and also tool calls.
Tests: