File tree Expand file tree Collapse file tree 1 file changed +26
-0
lines changed
Expand file tree Collapse file tree 1 file changed +26
-0
lines changed Original file line number Diff line number Diff line change @@ -381,3 +381,29 @@ or use the macOS MLX backend on Apple hardware.
381381** ` ELF section name out of range ` at link time** (Linux)
382382libtorch is on a Docker volume-mounted macOS path. Move it to a native Linux
383383path such as ` /opt/libtorch ` .
384+
385+ ---
386+
387+ ## Performance
388+
389+ Benchmarked on an ** M4 Mac Mini** (MLX/Metal GPU backend). Model loading takes a few
390+ seconds on first run, but once loaded, inference is ** 10x–24x faster than real-time** .
391+ NVIDIA GPU builds (CUDA libtorch) will be significantly faster still.
392+
393+ ### CLI
394+
395+ | Audio | Duration | Wall time | Notes |
396+ | -------| ----------| -----------| -------|
397+ | Demo WAV (real speech) | 5.44 s | 12.9 s | Includes 3.3 s model load + Metal compile |
398+ | Synthetic 60 s tone | 60 s | 18.1 s | Tests chunking (chunks split at ~ 35 s) |
399+
400+ ### API Server
401+
402+ The server loads the model once at startup. Subsequent requests skip model loading
403+ and Metal shader compilation, making warm requests extremely fast.
404+
405+ | Request | Audio | Response time | Notes |
406+ | ---------| -------| ---------------| -------|
407+ | 1st request (cold Metal) | 5.44 s | 8.3 s | Includes Metal shader compilation |
408+ | 2nd+ requests (warm) | 5.44 s | 0.4 s | 13x faster than audio duration |
409+ | 60 s chunked (warm) | 60 s | 2.5 s | 24x faster than audio duration |
You can’t perform that action at this time.
0 commit comments