|
1 | | -# OpenAI Realtime WebRTC Get Started Example |
| 1 | +# OpenAI Realtime SIP Get Started Example |
2 | 2 |
|
3 | | -This is a minimal WebRTC application demonstrating interaction with OpenAI's [Realtime API](https://platform.openai.com/docs/guides/realtime-webrtc). It sets up a peer connection and streams audio from a Windows audio device. Once connected, it sends a message to start the conversation and prints transcription results from both the user and the assistant. |
| 3 | +This example demonstrates placing a SIP call to OpenAI's Realtime SIP endpoint and then upgrading that call to a realtime WebSocket session after an incoming webhook. Audio is captured from the default Windows input/output devices (via SIPSorcery + Windows audio endpoint) and sent using PCM (Opus is not currently negotiated successfully as of 05 Sep 2025). |
4 | 4 |
|
5 | | -> ⚠️ **Note**: As of 10 May 2025, this example successfully establishes an audio stream and receives data channel messages. However, echo cancellation is not implemented—use a headset or ensure your audio device supports echo cancellation to avoid feedback loops. |
| 5 | +> ⚠️ Note (05 Sep 2025): The example successfully places a SIP call, receives the webhook, accepts the call, and establishes a realtime WebSocket. Echo cancellation is NOT implemented. Use a headset or a device with hardware echo cancellation to avoid the assistant talking to itself. |
6 | 6 |
|
7 | | -## Features |
| 7 | +## What This Sample Does |
8 | 8 |
|
9 | | -- Establishes a WebRTC connection with OpenAI's realtime endpoint |
10 | | -- Streams audio directly from Windows devices |
11 | | -- Sends a response prompt to trigger conversation |
12 | | -- Handles and logs transcription deltas and completions for both user and assistant |
| 9 | +1. Starts a minimal ASP.NET Core web server to receive OpenAI webhook callbacks at `/webhook`. |
| 10 | +2. Places an outbound SIP TLS call to `sip.api.openai.com` using your OpenAI Project ID as the user part: `<PROJECT_ID>@sip.api.openai.com`. |
| 11 | +3. Waits for OpenAI to POST a webhook containing the `call_id`. |
| 12 | +4. Accepts the call via `POST /v1/realtime/calls/{call_id}/accept`. |
| 13 | +5. Opens a realtime WebSocket: `wss://api.openai.com/v1/realtime?call_id=...`. |
| 14 | +6. Sends an initial `response.create` instruction ("Say Hi."). |
| 15 | +7. Logs all incoming WebSocket text messages (JSON events from OpenAI). |
| 16 | +8. Streams audio between your local microphone/speakers and OpenAI (PCM). |
13 | 17 |
|
14 | 18 | ## Requirements |
15 | 19 |
|
16 | | -- Windows OS with audio devices |
17 | | -- [.NET 8.0 SDK](https://dotnet.microsoft.com/en-us/download/dotnet/8.0) |
18 | | -- OpenAI API key with access to the Realtime API |
| 20 | +- Windows OS (for WindowsAudioEndPoint in this demo) |
| 21 | +- .NET 8.0 SDK |
| 22 | +- OpenAI API key with Realtime + SIP access |
| 23 | +- OpenAI Project ID (e.g. `proj_...`) |
| 24 | +- A publicly accessible HTTPS endpoint for webhooks (ngrok recommended) |
19 | 25 |
|
20 | | -## Getting Started |
| 26 | +## Environment Variables |
21 | 27 |
|
22 | | -1. **Set your OpenAI API key as an environment variable**: |
| 28 | +Set these before running: |
23 | 29 |
|
24 | | -```bash |
| 30 | +Windows (cmd.exe): |
| 31 | +``` |
25 | 32 | set OPENAI_API_KEY=your_openai_key |
| 33 | +set OPENAI_PROJECT_ID=your_openai_project_id |
26 | 34 | ``` |
27 | 35 |
|
28 | | -2. **Run the application**: |
| 36 | +PowerShell: |
| 37 | +``` |
| 38 | +$env:OPENAI_API_KEY="your_openai_key" |
| 39 | +$env:OPENAI_PROJECT_ID="your_openai_project_id" |
| 40 | +``` |
29 | 41 |
|
30 | | -```bash |
31 | | -dotnet run |
| 42 | +## Exposing the Webhook (ngrok) |
| 43 | + |
| 44 | +1. Reserve / configure a domain in the ngrok dashboard (recommended) or use a temporary forwarding URL. |
| 45 | +2. In the OpenAI dashboard: Settings -> Webhooks -> Add webhook |
| 46 | + - URL: `https://<your-ngrok-domain>/webhook` |
| 47 | +3. Start ngrok to forward to the Kestrel HTTPS port from `launchSettings.json` (default shown there is `https://localhost:53742`): |
| 48 | +``` |
| 49 | +ngrok http --url=<your-ngrok-domain> https://localhost:53742 |
32 | 50 | ``` |
33 | 51 |
|
34 | | -3. **Interact**: |
| 52 | +## Run |
| 53 | + |
| 54 | +``` |
| 55 | +dotnet run |
| 56 | +``` |
35 | 57 |
|
36 | | -Speak into your microphone and observe the transcription logs for both your voice and the assistant’s responses. |
| 58 | +You should see logs indicating: |
| 59 | +- Web server started |
| 60 | +- SIP call attempt to `<PROJECT_ID>@sip.api.openai.com;transport=tls` |
| 61 | +- Incoming webhook with `call_id` |
| 62 | +- Accept POST success |
| 63 | +- WebSocket connected and subsequent JSON event logs |
37 | 64 |
|
38 | 65 | ## File Overview |
39 | 66 |
|
40 | 67 | ### Program.cs |
| 68 | +Core sample logic: |
| 69 | +- Configures Serilog logging. |
| 70 | +- Validates `OPENAI_API_KEY` and `OPENAI_PROJECT_ID` env vars. |
| 71 | +- Registers an HTTP POST `/webhook` endpoint to receive call events. |
| 72 | +- On webhook: extracts `call_id`, sends accept request, starts WebSocket task. |
| 73 | +- Initiates SIP call using SIPSorcery (`SIPUserAgent`, `VoIPMediaSession`). |
| 74 | +- Opens WebSocket and sends an initial `response.create` instruction. |
| 75 | +- Streams and logs incoming WebSocket messages. |
| 76 | + |
| 77 | +### launchSettings.json |
| 78 | +Specifies the local HTTPS port (used for your ngrok forwarding target). |
| 79 | + |
| 80 | +## Audio Notes |
| 81 | + |
| 82 | +- Example uses `WindowsAudioEndPoint` with default input/output devices. |
| 83 | +- Opus was attempted (commented line) but PCM only negotiates successfully at time of writing. |
| 84 | +- No echo cancellation; prefer headset. |
| 85 | + |
| 86 | +## Customizing |
| 87 | + |
| 88 | +- Change initial instruction: edit the anonymous object `responseCreate` in `StartWebSocketConnection`. |
| 89 | +- Provide different model/instructions for acceptance by altering `call_accept` record fields. |
| 90 | +- Add parsing of WebSocket JSON events to handle partial transcripts, tool calls, etc. |
| 91 | + |
| 92 | +## Troubleshooting |
41 | 93 |
|
42 | | -Contains the core application logic: |
43 | | -- Initializes the OpenAI WebRTC endpoint |
44 | | -- Connects audio from the default Windows input device |
45 | | -- Sends session updates and creates a response to initiate conversation |
46 | | -- Logs transcription updates and completions |
| 94 | +- No webhook received: verify ngrok is running and the correct HTTPS URL is registered in OpenAI settings. |
| 95 | +- 401 / auth errors: confirm `OPENAI_API_KEY` environment variable is set in the same shell you run `dotnet run`. |
| 96 | +- SIP call fails immediately: check outbound TLS (firewall/proxy) and that your Project ID is correct. |
| 97 | +- WebSocket closes: inspect logged close status and ensure accept POST succeeded. |
47 | 98 |
|
48 | | -## Notes |
| 99 | +## Security |
49 | 100 |
|
50 | | -- Echo cancellation is not handled in this demo. If you're using speakers, OpenAI may end up responding to itself. Use a headset for clean operation. |
51 | | -- Transcription is enabled using the `Whisper1` model. |
52 | | -- This demo is part of the `SIPSorcery.OpenAI.WebRTC` library. |
| 101 | +- Do NOT hardcode your API key. Use environment variables or a secure secrets store. |
| 102 | +- Restrict exposure of your webhook endpoint. ngrok URLs are public; rotate as needed. |
53 | 103 |
|
54 | 104 | ## License |
55 | 105 |
|
|
0 commit comments