Skip to content

Commit 09ecaae

Browse files
juntaoclaude
andcommitted
Add performance benchmarks to README
M4 Mac Mini results: 13x–24x faster than real-time on warm requests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 285c73e commit 09ecaae

File tree

1 file changed

+26
-0
lines changed

1 file changed

+26
-0
lines changed

README.md

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -381,3 +381,29 @@ or use the macOS MLX backend on Apple hardware.
381381
**`ELF section name out of range` at link time** (Linux)
382382
libtorch is on a Docker volume-mounted macOS path. Move it to a native Linux
383383
path such as `/opt/libtorch`.
384+
385+
---
386+
387+
## Performance
388+
389+
Benchmarked on an **M4 Mac Mini** (MLX/Metal GPU backend). Model loading takes a few
390+
seconds on first run, but once loaded, inference is **10x–24x faster than real-time**.
391+
NVIDIA GPU builds (CUDA libtorch) will be significantly faster still.
392+
393+
### CLI
394+
395+
| Audio | Duration | Wall time | Notes |
396+
|-------|----------|-----------|-------|
397+
| Demo WAV (real speech) | 5.44 s | 12.9 s | Includes 3.3 s model load + Metal compile |
398+
| Synthetic 60 s tone | 60 s | 18.1 s | Tests chunking (chunks split at ~35 s) |
399+
400+
### API Server
401+
402+
The server loads the model once at startup. Subsequent requests skip model loading
403+
and Metal shader compilation, making warm requests extremely fast.
404+
405+
| Request | Audio | Response time | Notes |
406+
|---------|-------|---------------|-------|
407+
| 1st request (cold Metal) | 5.44 s | 8.3 s | Includes Metal shader compilation |
408+
| 2nd+ requests (warm) | 5.44 s | 0.4 s | 13x faster than audio duration |
409+
| 60 s chunked (warm) | 60 s | 2.5 s | 24x faster than audio duration |

0 commit comments

Comments
 (0)