Skip to content

Commit 9215834

Browse files
authored
Migrate examples to the new Runner API (#301)
1 parent 492a117 commit 9215834

File tree

60 files changed

+548
-562
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

60 files changed

+548
-562
lines changed

DEVELOPMENT.md

Lines changed: 54 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -11,18 +11,27 @@ pre-commit install
1111
```
1212

1313
To setup your .env
14+
1415
```bash
1516
cp env.example .env
1617
```
1718

1819
## Running
20+
1921
```bash
20-
uv run examples/01_simple_agent_example/simple_agent_example.py
22+
uv run examples/01_simple_agent_example/simple_agent_example.py run
2123
```
2224

2325
### Running with a video file as input
26+
2427
```bash
25-
uv run <path-to-example> --video-track-override <path-to-video>
28+
uv run <path-to-example> run --video-track-override <path-to-video>
29+
```
30+
31+
### Running as an HTTP server
32+
33+
```bash
34+
uv run <path-to-example> serve --host=<host> --port=<port>
2635
```
2736

2837
## Tests
@@ -34,6 +43,7 @@ uv run py.test -m "not integration" -n auto
3443
```
3544

3645
Integration test. (requires secrets in place, see .env setup)
46+
3747
```
3848
uv run py.test -m "integration" -n auto
3949
```
@@ -60,7 +70,6 @@ uv run ruff check --fix
6070

6171
### Mypy type checks
6272

63-
6473
```
6574
uv run mypy --install-types --non-interactive -p vision_agents
6675
```
@@ -119,8 +128,10 @@ To see how the agent work open up agents.py
119128
Some important things about audio inside the library:
120129

121130
1. WebRTC uses Opus 48khz stereo but inside the library audio is always in PCM format
122-
2. Plugins / AI models work with different PCM formats, passing bytes around without a container type leads to kaos and is forbidden
123-
3. PCM data is always passed around using the `PcmData` object which contains information about sample rate, channels and format
131+
2. Plugins / AI models work with different PCM formats, passing bytes around without a container type leads to kaos and
132+
is forbidden
133+
3. PCM data is always passed around using the `PcmData` object which contains information about sample rate, channels
134+
and format
124135
4. Audio resampling can be done using `PcmData.resample` method
125136
5. Adjusting from stereo to mono and vice-versa can be done using the `PcmData.resample` method
126137
6. `PcmData` comes with convenience constructor methods to build from bytes, iterators, ndarray, ...
@@ -132,6 +143,7 @@ import asyncio
132143
from getstream.video.rtc.track_util import PcmData
133144
from openai import AsyncOpenAI
134145

146+
135147
async def example():
136148
client = AsyncOpenAI(api_key="sk-42")
137149

@@ -162,6 +174,7 @@ async def example():
162174

163175
await play_pcm_with_ffplay(resampled_pcm)
164176

177+
165178
if __name__ == "__main__":
166179
asyncio.run(example())
167180
```
@@ -177,6 +190,7 @@ Sometimes you need to test audio manually, here's some tips:
177190
## Creating PcmData
178191

179192
### from_bytes
193+
180194
Build from raw PCM bytes
181195

182196
```python
@@ -186,6 +200,7 @@ PcmData.from_bytes(audio_bytes, sample_rate=16000, format=AudioFormat.S16, chann
186200
```
187201

188202
### from_numpy
203+
189204
Build from numpy arrays with automatic dtype/shape conversion
190205

191206
```python
@@ -194,6 +209,7 @@ PcmData.from_numpy(np.array([1, 2], np.int16), sample_rate=16000, format=AudioFo
194209
```
195210

196211
### from_response
212+
197213
Construct from API response (bytes, iterators, async iterators, objects with .data)
198214

199215
```python
@@ -204,6 +220,7 @@ PcmData.from_response(
204220
```
205221

206222
### from_av_frame
223+
207224
Create from PyAV AudioFrame
208225

209226
```python
@@ -213,27 +230,31 @@ PcmData.from_av_frame(frame)
213230
## Converting Format
214231

215232
### to_float32
233+
216234
Convert samples to float32 in [-1, 1]
217235

218236
```python
219237
pcm_f32 = pcm.to_float32()
220238
```
221239

222240
### to_int16
241+
223242
Convert samples to int16 PCM format
224243

225244
```python
226245
pcm_s16 = pcm.to_int16()
227246
```
228247

229248
### to_bytes
249+
230250
Return interleaved PCM bytes
231251

232252
```python
233253
audio_bytes = pcm.to_bytes()
234254
```
235255

236256
### to_wav_bytes
257+
237258
Return WAV file bytes (header + frames)
238259

239260
```python
@@ -253,20 +274,23 @@ pcm = pcm.resample(16000, target_channels=1) # to 16khz, mono
253274
## Manipulating Audio
254275

255276
### append
277+
256278
Append another PcmData in-place (adjusts format/rate automatically)
257279

258280
```python
259281
pcm.append(other_pcm)
260282
```
261283

262284
### copy
285+
263286
Create a deep copy
264287

265288
```python
266289
pcm_copy = pcm.copy()
267290
```
268291

269292
### clear
293+
270294
Clear all samples in-place (keeps metadata)
271295

272296
```python
@@ -276,20 +300,23 @@ pcm.clear()
276300
## Slicing and Chunking
277301

278302
### head
303+
279304
Keep only the first N seconds
280305

281306
```python
282307
pcm_head = pcm.head(duration_s=3.0)
283308
```
284309

285310
### tail
311+
286312
Keep only the last N seconds
287313

288314
```python
289315
pcm_tail = pcm.tail(duration_s=5.0)
290316
```
291317

292318
### chunks
319+
293320
Iterate over fixed-size chunks with optional overlap
294321

295322
```python
@@ -318,7 +345,8 @@ pcm = await queue.get_duration(100)
318345

319346
# AudioTrack
320347

321-
Use `getstream.video.rtc.AudioTrack` if you need to publish audio using PyAV, this class ensures that `recv` paces audio correctly every 20ms.
348+
Use `getstream.video.rtc.AudioTrack` if you need to publish audio using PyAV, this class ensures that `recv` paces audio
349+
correctly every 20ms.
322350

323351
- Use `.write()` method to enqueue audio (PcmData)
324352
- Use `.flush()` to empty all the enqueued audio (eg. barge-in event)
@@ -347,8 +375,10 @@ This prevents mistakes related to handling audio with different formats, sample
347375

348376
### Testing
349377

350-
Many of the underlying APIs change daily. To ensure things work we keep 2 sets of tests. Integration tests and unit tests.
351-
Integration tests run once a day to verify that changes to underlying APIs didn't break the framework. Some testing guidelines
378+
Many of the underlying APIs change daily. To ensure things work we keep 2 sets of tests. Integration tests and unit
379+
tests.
380+
Integration tests run once a day to verify that changes to underlying APIs didn't break the framework. Some testing
381+
guidelines
352382

353383
- Every plugin needs an integration test
354384
- Limit usage of response capturing style testing. (since they diverge from reality)
@@ -442,11 +472,13 @@ metrics.set_meter_provider(
442472
start_http_server(port=9464)
443473
```
444474

445-
You can now see the metrics at `http://localhost:9464/metrics` (make sure that your Python program keeps running), after this you can setup your Prometheus server to scrape this endpoint.
475+
You can now see the metrics at `http://localhost:9464/metrics` (make sure that your Python program keeps running), after
476+
this you can setup your Prometheus server to scrape this endpoint.
446477

447478
### Profiling
448479

449-
The `Profiler` class uses `pyinstrument` to profile your agent's performance and generate an HTML report showing where time is spent during execution.
480+
The `Profiler` class uses `pyinstrument` to profile your agent's performance and generate an HTML report showing where
481+
time is spent during execution.
450482

451483
#### Example usage:
452484

@@ -456,6 +488,7 @@ from vision_agents.core import User, Agent
456488
from vision_agents.core.profiling import Profiler
457489
from vision_agents.plugins import getstream, gemini, deepgram, elevenlabs, vogent
458490

491+
459492
async def start_agent() -> None:
460493
agent = Agent(
461494
edge=getstream.Edge(),
@@ -475,12 +508,13 @@ async def start_agent() -> None:
475508
```
476509

477510
The profiler automatically:
511+
478512
- Starts profiling when the agent is created
479513
- Stops profiling when the agent finishes (on `AgentFinishEvent`)
480514
- Saves an HTML report to the specified output path (default: `./profile.html`)
481515

482-
You can open the generated HTML file in a browser to view the performance profile, which shows a timeline of function calls and where time is spent during agent execution.
483-
516+
You can open the generated HTML file in a browser to view the performance profile, which shows a timeline of function
517+
calls and where time is spent during agent execution.
484518

485519
### Queuing
486520

@@ -498,21 +532,23 @@ You can open the generated HTML file in a browser to view the performance profil
498532

499533
### Video Frames & Tracks
500534

501-
- Track.recv errors will fail silently. The API is to return a frame. Never return None. and wait till the next frame is available
502-
- When using frame.to_ndarray(format="rgb24") specify the format. Typically you want rgb24 when connecting/sending to Yolo etc
535+
- Track.recv errors will fail silently. The API is to return a frame. Never return None. and wait till the next frame is
536+
available
537+
- When using frame.to_ndarray(format="rgb24") specify the format. Typically you want rgb24 when connecting/sending to
538+
Yolo etc
503539
- QueuedVideoTrack is a writable/queued video track implementation which is useful when forwarding video
504540

505-
506541
### Loading Resources in Plugins (aka "warmup")
507-
Some plugins require to download and use external resources like models to work.
542+
543+
Some plugins require to download and use external resources like models to work.
508544

509545
For example:
510546

511547
- `TurnDetection` plugins using a Silero VAD model to detect voice activity in the audio track.
512548
- Video processors using `YOLO` models
513549

514550
In order to standardise how these resources are loaded and to make it performant, the framework provides a special ABC
515-
`vision_agents.core.warmup.Warmable`.
551+
`vision_agents.core.warmup.Warmable`.
516552

517553
To use it, simply subclass it and define the required methods.
518554
Note that `Warmable` supports generics to leverage type checking.
@@ -551,12 +587,10 @@ class FasterWhisperSTT(STT, Warmable[WhisperModel]):
551587
# This method will be called every time a new agent is initialized.
552588
# The warmup process is now complete.
553589
self._whisper_model = whisper
554-
590+
555591
...
556592
```
557593

558-
559-
560594
## Onboarding Plan for new contributors
561595

562596
**Audio Formats**

0 commit comments

Comments
 (0)