Skip to content

Commit 1b9c59c

Browse files
committed
copilot readme update.
1 parent ef95823 commit 1b9c59c

File tree

1 file changed

+78
-28
lines changed

1 file changed

+78
-28
lines changed

examples/GetStarted/README.md

Lines changed: 78 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -1,55 +1,105 @@
1-
# OpenAI Realtime WebRTC Get Started Example
1+
# OpenAI Realtime SIP Get Started Example
22

3-
This is a minimal WebRTC application demonstrating interaction with OpenAI's [Realtime API](https://platform.openai.com/docs/guides/realtime-webrtc). It sets up a peer connection and streams audio from a Windows audio device. Once connected, it sends a message to start the conversation and prints transcription results from both the user and the assistant.
3+
This example demonstrates placing a SIP call to OpenAI's Realtime SIP endpoint and then upgrading that call to a realtime WebSocket session after an incoming webhook. Audio is captured from the default Windows input/output devices (via SIPSorcery + Windows audio endpoint) and sent using PCM (Opus is not currently negotiated successfully as of 05 Sep 2025).
44

5-
> ⚠️ **Note**: As of 10 May 2025, this example successfully establishes an audio stream and receives data channel messages. However, echo cancellation is not implemented—use a headset or ensure your audio device supports echo cancellation to avoid feedback loops.
5+
> ⚠️ Note (05 Sep 2025): The example successfully places a SIP call, receives the webhook, accepts the call, and establishes a realtime WebSocket. Echo cancellation is NOT implemented. Use a headset or a device with hardware echo cancellation to avoid the assistant talking to itself.
66
7-
## Features
7+
## What This Sample Does
88

9-
- Establishes a WebRTC connection with OpenAI's realtime endpoint
10-
- Streams audio directly from Windows devices
11-
- Sends a response prompt to trigger conversation
12-
- Handles and logs transcription deltas and completions for both user and assistant
9+
1. Starts a minimal ASP.NET Core web server to receive OpenAI webhook callbacks at `/webhook`.
10+
2. Places an outbound SIP TLS call to `sip.api.openai.com` using your OpenAI Project ID as the user part: `<PROJECT_ID>@sip.api.openai.com`.
11+
3. Waits for OpenAI to POST a webhook containing the `call_id`.
12+
4. Accepts the call via `POST /v1/realtime/calls/{call_id}/accept`.
13+
5. Opens a realtime WebSocket: `wss://api.openai.com/v1/realtime?call_id=...`.
14+
6. Sends an initial `response.create` instruction ("Say Hi.").
15+
7. Logs all incoming WebSocket text messages (JSON events from OpenAI).
16+
8. Streams audio between your local microphone/speakers and OpenAI (PCM).
1317

1418
## Requirements
1519

16-
- Windows OS with audio devices
17-
- [.NET 8.0 SDK](https://dotnet.microsoft.com/en-us/download/dotnet/8.0)
18-
- OpenAI API key with access to the Realtime API
20+
- Windows OS (for WindowsAudioEndPoint in this demo)
21+
- .NET 8.0 SDK
22+
- OpenAI API key with Realtime + SIP access
23+
- OpenAI Project ID (e.g. `proj_...`)
24+
- A publicly accessible HTTPS endpoint for webhooks (ngrok recommended)
1925

20-
## Getting Started
26+
## Environment Variables
2127

22-
1. **Set your OpenAI API key as an environment variable**:
28+
Set these before running:
2329

24-
```bash
30+
Windows (cmd.exe):
31+
```
2532
set OPENAI_API_KEY=your_openai_key
33+
set OPENAI_PROJECT_ID=your_openai_project_id
2634
```
2735

28-
2. **Run the application**:
36+
PowerShell:
37+
```
38+
$env:OPENAI_API_KEY="your_openai_key"
39+
$env:OPENAI_PROJECT_ID="your_openai_project_id"
40+
```
2941

30-
```bash
31-
dotnet run
42+
## Exposing the Webhook (ngrok)
43+
44+
1. Reserve / configure a domain in the ngrok dashboard (recommended) or use a temporary forwarding URL.
45+
2. In the OpenAI dashboard: Settings -> Webhooks -> Add webhook
46+
- URL: `https://<your-ngrok-domain>/webhook`
47+
3. Start ngrok to forward to the Kestrel HTTPS port from `launchSettings.json` (default shown there is `https://localhost:53742`):
48+
```
49+
ngrok http --url=<your-ngrok-domain> https://localhost:53742
3250
```
3351

34-
3. **Interact**:
52+
## Run
53+
54+
```
55+
dotnet run
56+
```
3557

36-
Speak into your microphone and observe the transcription logs for both your voice and the assistant’s responses.
58+
You should see logs indicating:
59+
- Web server started
60+
- SIP call attempt to `<PROJECT_ID>@sip.api.openai.com;transport=tls`
61+
- Incoming webhook with `call_id`
62+
- Accept POST success
63+
- WebSocket connected and subsequent JSON event logs
3764

3865
## File Overview
3966

4067
### Program.cs
68+
Core sample logic:
69+
- Configures Serilog logging.
70+
- Validates `OPENAI_API_KEY` and `OPENAI_PROJECT_ID` env vars.
71+
- Registers an HTTP POST `/webhook` endpoint to receive call events.
72+
- On webhook: extracts `call_id`, sends accept request, starts WebSocket task.
73+
- Initiates SIP call using SIPSorcery (`SIPUserAgent`, `VoIPMediaSession`).
74+
- Opens WebSocket and sends an initial `response.create` instruction.
75+
- Streams and logs incoming WebSocket messages.
76+
77+
### launchSettings.json
78+
Specifies the local HTTPS port (used for your ngrok forwarding target).
79+
80+
## Audio Notes
81+
82+
- Example uses `WindowsAudioEndPoint` with default input/output devices.
83+
- Opus was attempted (commented line) but PCM only negotiates successfully at time of writing.
84+
- No echo cancellation; prefer headset.
85+
86+
## Customizing
87+
88+
- Change initial instruction: edit the anonymous object `responseCreate` in `StartWebSocketConnection`.
89+
- Provide different model/instructions for acceptance by altering `call_accept` record fields.
90+
- Add parsing of WebSocket JSON events to handle partial transcripts, tool calls, etc.
91+
92+
## Troubleshooting
4193

42-
Contains the core application logic:
43-
- Initializes the OpenAI WebRTC endpoint
44-
- Connects audio from the default Windows input device
45-
- Sends session updates and creates a response to initiate conversation
46-
- Logs transcription updates and completions
94+
- No webhook received: verify ngrok is running and the correct HTTPS URL is registered in OpenAI settings.
95+
- 401 / auth errors: confirm `OPENAI_API_KEY` environment variable is set in the same shell you run `dotnet run`.
96+
- SIP call fails immediately: check outbound TLS (firewall/proxy) and that your Project ID is correct.
97+
- WebSocket closes: inspect logged close status and ensure accept POST succeeded.
4798

48-
## Notes
99+
## Security
49100

50-
- Echo cancellation is not handled in this demo. If you're using speakers, OpenAI may end up responding to itself. Use a headset for clean operation.
51-
- Transcription is enabled using the `Whisper1` model.
52-
- This demo is part of the `SIPSorcery.OpenAI.WebRTC` library.
101+
- Do NOT hardcode your API key. Use environment variables or a secure secrets store.
102+
- Restrict exposure of your webhook endpoint. ngrok URLs are public; rotate as needed.
53103

54104
## License
55105

0 commit comments

Comments
 (0)