-
Notifications
You must be signed in to change notification settings - Fork 14
Description
It would be very useful to have a way to connect with Ultravox via WebRTC, similar to how OpenAI Realtime provides it:
https://platform.openai.com/docs/guides/realtime
async function init() {
// Get an ephemeral key from your server – see server code below
const tokenResponse = await fetch("/session");
const data = await tokenResponse.json();
const EPHEMERAL_KEY = data.client_secret.value;
// Create a peer connection
const pc = new RTCPeerConnection();
// Set up to play remote audio from the model
const audioEl = document.createElement("audio");
audioEl.autoplay = true;
pc.ontrack = e => audioEl.srcObject = e.streams[0];
// Add local audio track for microphone input
const ms = await navigator.mediaDevices.getUserMedia({ audio: true });
pc.addTrack(ms.getTracks()[0]);
// Set up data channel for sending and receiving events
const dc = pc.createDataChannel("oai-events");
dc.addEventListener("message", (e) => {
// Realtime server events appear here!
console.log(e);
});
// Start the session using the Session Description Protocol (SDP)
const offer = await pc.createOffer();
await pc.setLocalDescription(offer);
// NOTE: This is the important part! They provide a way to send the offer and get back the response,
// all the webRTC stuff can run on any webRTC implementation, not just the webrowser.
const baseUrl = "https://api.openai.com/v1/realtime";
const model = "gpt-4o-realtime-preview-2025-06-03";
const sdpResponse = await fetch(`${baseUrl}?model=${model}`, {
method: "POST",
body: offer.sdp,
headers: {
Authorization: `Bearer ${EPHEMERAL_KEY}`,
"Content-Type": "application/sdp"
},
});
const answer = {
type: "answer",
sdp: await sdpResponse.text(),
};
await pc.setRemoteDescription(answer);
}This pattern is incredibly handy: you simply send the WebRTC offer to an HTTP POST endpoint and receive the answer to establish the connection. It works on any platform that supports WebRTC—not just browsers.
The current Ultravox SDK clients are quite limiting in comparison. For example, I’m using OpenAI Voice Realtime on Android via Unity WebRTC and am considering migrating to Ultravox. However, I’ve hit a roadblock: to proceed, I’d need to implement a LiveKit client in C#...
Providing a way to initiate a raw WebRTC session—like OpenAI does—would be a significant improvement.
I understand that the current SDK, built on top of LiveKit, abstracts away a lot of complexity, which is valuable. However, it's also too rigid in many use cases. It would be ideal if you could offer a lower-level WebRTC integration as well. OpenAI does this by supporting both a LiveKit-based option and standalone WebRTC endpoints for developers who need more control.