Skip to content

Feature Request: Support WebRTC Session Initialization via HTTP POST (like OpenAI Realtime) #25

@blasco

Description

@blasco

It would be very useful to have a way to connect with Ultravox via WebRTC, similar to how OpenAI Realtime provides it:

https://platform.openai.com/docs/guides/realtime

async function init() {
  // Get an ephemeral key from your server – see server code below
  const tokenResponse = await fetch("/session");
  const data = await tokenResponse.json();
  const EPHEMERAL_KEY = data.client_secret.value;

  // Create a peer connection
  const pc = new RTCPeerConnection();

  // Set up to play remote audio from the model
  const audioEl = document.createElement("audio");
  audioEl.autoplay = true;
  pc.ontrack = e => audioEl.srcObject = e.streams[0];

  // Add local audio track for microphone input
  const ms = await navigator.mediaDevices.getUserMedia({ audio: true });
  pc.addTrack(ms.getTracks()[0]);

  // Set up data channel for sending and receiving events
  const dc = pc.createDataChannel("oai-events");
  dc.addEventListener("message", (e) => {
    // Realtime server events appear here!
    console.log(e);
  });

  // Start the session using the Session Description Protocol (SDP)
  const offer = await pc.createOffer();
  await pc.setLocalDescription(offer);

  // NOTE: This is the important part! They provide a way to send the offer and get back the response, 
  // all the webRTC stuff can run on any webRTC implementation, not just the webrowser.
  const baseUrl = "https://api.openai.com/v1/realtime";
  const model = "gpt-4o-realtime-preview-2025-06-03";
  const sdpResponse = await fetch(`${baseUrl}?model=${model}`, {
    method: "POST",
    body: offer.sdp,
    headers: {
      Authorization: `Bearer ${EPHEMERAL_KEY}`,
      "Content-Type": "application/sdp"
    },
  });

  const answer = {
    type: "answer",
    sdp: await sdpResponse.text(),
  };
  await pc.setRemoteDescription(answer);
}

This pattern is incredibly handy: you simply send the WebRTC offer to an HTTP POST endpoint and receive the answer to establish the connection. It works on any platform that supports WebRTC—not just browsers.

The current Ultravox SDK clients are quite limiting in comparison. For example, I’m using OpenAI Voice Realtime on Android via Unity WebRTC and am considering migrating to Ultravox. However, I’ve hit a roadblock: to proceed, I’d need to implement a LiveKit client in C#...

Providing a way to initiate a raw WebRTC session—like OpenAI does—would be a significant improvement.

I understand that the current SDK, built on top of LiveKit, abstracts away a lot of complexity, which is valuable. However, it's also too rigid in many use cases. It would be ideal if you could offer a lower-level WebRTC integration as well. OpenAI does this by supporting both a LiveKit-based option and standalone WebRTC endpoints for developers who need more control.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions