Skip to content

Commit 2ac4ef2

Browse files
authored
docs: real-time audio bytes integration (#3373)
Add new section on Real-Time Audio Bytes integration, including guide, key points, example request, and use cases
1 parent e16b47e commit 2ac4ef2

File tree

1 file changed

+30
-0
lines changed

1 file changed

+30
-0
lines changed

docs/doc/developer/apps/Integrations.mdx

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,36 @@ These apps process conversation transcripts as they occur, enabling real-time an
4040
- Real-time web searches or fact-checking
4141
- Emotional state analysis and supportive responses
4242

43+
### 3. 🔊 Real-Time Audio Bytes (audio_bytes)
44+
45+
#### Guide:
46+
47+
Omi can stream raw audio bytes from a DevKit directly to an external endpoint. This integration uses the trigger type `audio_bytes` and is useful for applications that need raw PCM audio (VAD, custom ASR, feature extraction, or long-term storage).
48+
49+
Key points:
50+
51+
- Trigger name: `audio_bytes`
52+
- Request: POST
53+
- Query parameters appended by the backend: `sample_rate` (e.g. 16000) and `uid`
54+
- Content-Type: `application/octet-stream` (raw PCM16 bytes)
55+
56+
Example request:
57+
58+
`POST /your-endpoint?sample_rate=16000&uid=user123` with the request body containing raw PCM16 (16-bit little-endian) audio bytes.
59+
60+
Notes and recommendations:
61+
62+
- The bytes are raw PCM16 (16-bit little-endian), 2 bytes per sample. To produce a playable WAV file you must prepend a WAV header and concatenate the received chunks.
63+
- You control delivery frequency in the Omi app Developer Settings (the webhook field accepts `url,seconds` — the second value configures how many seconds of audio are sent per request).
64+
- For an example implementation and deployment-ready code, see the Real-Time Audio Streaming guide: `doc/developer/apps/AudioStreaming.mdx`.
65+
66+
#### Example Use Cases
67+
68+
- Voice activity detection (VAD) and endpointing
69+
- Feed custom ASR or speech models for research or improved accuracy
70+
- Extract audio features (spectrograms, embeddings) in real-time
71+
- Store raw audio chunks to cloud storage for later processing
72+
4373
## Creating an Integration App
4474

4575
### Step 1: Define Your App 🎯

0 commit comments

Comments
 (0)