Skip to content

A new event handler is needed for the sdk @openai/agents-realtime #550

@scottywm

Description

@scottywm

INTRODUCTION

OK I'm back again but this time I've figured out exactly what the problem is and where and when it's occuring.

Let me explain the problem again just to refresh your memory.

I'm building an app in capacitor and how the app works is that a webRTC is created on the client side and it streams audio from the client side to openAI and when the client speaks into the microphone openAI replies back in audio. Its awesome.

THE PROBLEM

The problem I'm having is that when the audio first starts to play on the capacitor app on the ios device it stops then starts then stops and starts on the first play. After the first play it works perfectly. There are absolutely no problems at all when the capacitor app is on an android device or even when the same code runs in chrome or safari. This error is only occuring on apps like capacitor when they are installed on ios devices.

The reason for this error is because capacitor apps for ios use a special wrapper that mimics safari and the reason this error doesnt occur in the safari browser is because safari's audio handling has been built to handle this situation but the wrappers that are used in frameworks like capacitor haven't been just yet, let me explain this in more detail.......

THE SOLUTION

This whole problem can be 100% avoided if openAI provide another event callback function in their sdk (@openai/agents-realtime) that triggers when and only when the media stream and it's tracks are first created by openAI and before they get mounted to any audio element or anywhere else. This callback function MUST be given the media stream and its track(s) as a parameter so apps like capacitor or any other apps can be able to programatically insert this track into a media stream that their code has already created and primed. This will work not just for capacitor but several others too that use wrappers that mimic safari in the same way that WKWebView does.

All I'm asking is that openAI create a new event handler that gives us access to the media stream and its tracks as soon as openAI creates them when the webRTC is first created.

The event handler will look something like this...

session.on('media.stream-created', (mediaStream) => {

// Here, I will then take the tracks out of the stream and put it into the stream I created and attached to the audio element.

})

I have logged all the events that the SDK currently provides and there is no such event that occurs the moment the media stream is created that gives me the media stream and the track so at this stage I cannot fix this stop and start bug on the first play.

My hands are tied here because the sdk doesn't give me such an event that I can use to fix this problem so for now I'm begging you (openAI) to add this new feature in, it's not much work at all on their end and it is actually needed because many apps not just capacitor will encounter this problem too.

I have made this video for you to share with openAI so they can see what the issue looks like.

Please refer this to openAI and get them to fix this asap and get back to me with status because I'm 90% through building my app and I'm counting on this.

https://youtu.be/EY5epJqta_0

P.S: sorry about the inconvenience the first time

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions