Skip to content

Conversation

@HVbajoria
Copy link

@HVbajoria HVbajoria commented May 22, 2025

Purpose

  • Added support for the new transcription models: gpt-4o-transcribe and gpt-4o-mini-transcribe
  • Introduced optional language (ISO-639-1 format) and prompt parameters to the InputAudioTranscription interface
  • Applied changes in both JavaScript and Python SDKs
  • Updated the core library files and demonstrated usage in client_test under rtclient

Does this introduce a breaking change?

[ ] Yes
[x] No

Pull Request Type

What kind of change does this Pull Request introduce?

[ ] Bugfix
[x] Feature
[ ] Code style update (formatting, local variables)
[ ] Refactoring (no functional changes, no api changes)
[x] Documentation content changes
[ ] Other... Please describe:

How to Test

For JavaScript

  • Get the code

    git clone https://github.com/Azure-Samples/aoai-realtime-audio-sdk.git
    cd aoai-realtime-audio-sdk
    cd javascript
    git checkout Added_Model_Support
    npm install
  • Run tests

  • Manually test usage with the updated inputTranscription settings under the standalone/test/client.spec.ts line 120:

    input_audio_transcription: {
             model:"gpt-4o-transcribe",
             language="en",
             prompt="expect words related to technology"
            },
    };

For Python

  • Get the code

    git clone https://github.com/Azure-Samples/aoai-realtime-audio-sdk.git
    cd aoai-realtime-audio-sdk
    cd python
    cd samples
    git checkout Added_Model_Support
    pip3 install -r requirements.txt
    cd ..
  • Test using client_test under rtclient:

    • Update the test snippet:

      input_audio_transcription = InputAudioTranscription(
          model="gpt-4o-transcribe",
          language="en",
          prompt="expect words related to technology"
      )
    • Try with different model values:

      • "whisper-1"
      • "gpt-4o-mini-transcribe"
      • "gpt-4o-transcribe"

What to Check

Verify that the following are valid:

  • InputAudioTranscription in both Python and JavaScript accepts model, language, and prompt

  • All new values (gpt-4o-transcribe, gpt-4o-mini-transcribe) are allowed and passed correctly

  • Prompt format is respected based on the model:

    • Whisper: comma-separated keywords
    • GPT-4o models: free-text
  • No regressions in existing behavior when using only whisper-1

  • Functional parity between Python and JavaScript implementations

Other Information

  • The updates are backward compatible
  • Comments and type hints added where applicable
  • Example test cases show use with different model values and prompt formats
  • Enables developers to take advantage of OpenAI’s latest audio transcription capabilities across SDKs

@HVbajoria
Copy link
Author

HVbajoria commented May 22, 2025

Hi @glecaros , @jpalvarezl, @trrwilson,

I have referred this document: https://learn.microsoft.com/en-us/azure/ai-services/openai/realtime-audio-reference

Under: RealtimeAudioInputTranscriptionSettings

Could you please check once?

@juliannicolas90
Copy link

This would be great to have!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants