Heads up: New open-source inference engine (uzu) claims ~30-40% speed improvement over llama.cpp on Apple Silicon #14965

lexasub · 2025-07-30T06:30:44Z

lexasub
Jul 30, 2025

Hi team,

I recently came across uzu , an open-source inference engine written in Rust, which claims to be ~30-40% faster than llama.cpp (specifically on Apple Silicon, per their documentation). The project is actively maintained and targets on-device inference workloads.

For transparency:

I haven’t personally verified these performance claims.
No independent benchmarks or third-party comparisons appear to be publicly available yet (as of my search).
Their GitHub/docs (trymirai.com ) state the results are based on their internal testing.
This isn’t a feature request or bug report—just a heads-up in case you’d like to evaluate their approach or methodology. If relevant, it might be worth investigating how their optimizations (e.g., Rust implementation, Metal backend tweaks) compare to llama.cpp’s current Apple Silicon support.

Feel free to close this if it’s not actionable, but I thought it worth flagging given the overlap in use cases. Cheers!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Heads up: New open-source inference engine (uzu) claims ~30-40% speed improvement over llama.cpp on Apple Silicon #14965

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Heads up: New open-source inference engine (uzu) claims ~30-40% speed improvement over llama.cpp on Apple Silicon #14965

Uh oh!

lexasub Jul 30, 2025

Replies: 0 comments

lexasub
Jul 30, 2025