Skip to content

AsyncRealtime class incompatible with transcription mode due to required model parameter #2652

@fedirz

Description

@fedirz

Confirm this is an issue with the Python library and not an underlying OpenAI API

  • This is an issue with the Python library

Describe the bug

The AsyncRealtime class cannot be used for real-time transcription because of a conflicting requirement with the API:

  1. The AsyncRealtime class requires users to provide a model parameter
  2. When using intent=transcribe, the API explicitly forbids the model parameter
  3. Even when passing model=None, the parameter is still sent as a query parameter, causing the API to reject the request

Error received:

{
  "error": {
    "message": "You must not provide a model parameter for transcription sessions.",
    "type": "invalid_request_error", 
    "code": "invalid_model",
    "event_id": null,
    "param": null
  },
  "event_id": "xxx",
  "type": "error"
}

Expected behavior:
The model parameter should be optional in the AsyncRealtime class. When model=None is passed (or not provided), it should not be included as a query parameter in the request. This would allow transcription mode to work properly while maintaining backward compatibility for other use cases.

Current workaround:
Bypass the AsyncRealtime class and create the connection manually:

realtime_client = AsyncOpenAI(max_retries=0).beta.realtime
conn = AsyncRealtimeConnection(
    await connect(
        WEBSOCKET_BASE_URL + "intent=transcription",
        additional_headers={
            "Authorization": "Bearer " + realtime_client._client.api_key,
        },
    )
)

To Reproduce

import asyncio
import logging

from openai import AsyncOpenAI
from openai.types.beta.realtime import transcription_session_update_param

MODEL = "gpt-4o-transcribe"

logger = logging.getLogger(__name__)
logging.basicConfig(level=logging.INFO)


async def main() -> None:
    realtime_client = AsyncOpenAI(max_retries=0).beta.realtime

    async with realtime_client.connect(model=MODEL, extra_query={"intent": "transcription"}) as conn:
        await conn.transcription_session.update(
            session=transcription_session_update_param.Session(
                input_audio_format="pcm16",
                input_audio_transcription=transcription_session_update_param.SessionInputAudioTranscription(
                    model="gpt-4o-transcribe"
                ),
                turn_detection=transcription_session_update_param.SessionTurnDetection(
                    type="server_vad",
                    threshold=0.9,
                    prefix_padding_ms=300,
                    silence_duration_ms=500,
                ),
                input_audio_noise_reduction=transcription_session_update_param.SessionInputAudioNoiseReduction(
                    type="near_field"
                ),
            )
        )


if __name__ == "__main__":
    asyncio.run(main())

Code snippets

OS

MacOS

Python version

Python 3.12

Library version

openai[realtime]==1.108.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions