You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I recently came across uzu , an open-source inference engine written in Rust, which claims to be ~30-40% faster than llama.cpp (specifically on Apple Silicon, per their documentation). The project is actively maintained and targets on-device inference workloads.
For transparency:
I haven’t personally verified these performance claims.
No independent benchmarks or third-party comparisons appear to be publicly available yet (as of my search).
Their GitHub/docs (trymirai.com ) state the results are based on their internal testing.
This isn’t a feature request or bug report—just a heads-up in case you’d like to evaluate their approach or methodology. If relevant, it might be worth investigating how their optimizations (e.g., Rust implementation, Metal backend tweaks) compare to llama.cpp’s current Apple Silicon support.
Feel free to close this if it’s not actionable, but I thought it worth flagging given the overlap in use cases. Cheers!
This discussion was converted from issue #14958 on July 30, 2025 10:49.
Heading
Bold
Italic
Quote
Code
Link
Numbered list
Unordered list
Task list
Attach files
Mention
Reference
Menu
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
Hi team,
I recently came across uzu , an open-source inference engine written in Rust, which claims to be ~30-40% faster than llama.cpp (specifically on Apple Silicon, per their documentation). The project is actively maintained and targets on-device inference workloads.
For transparency:
I haven’t personally verified these performance claims.
No independent benchmarks or third-party comparisons appear to be publicly available yet (as of my search).
Their GitHub/docs (trymirai.com ) state the results are based on their internal testing.
This isn’t a feature request or bug report—just a heads-up in case you’d like to evaluate their approach or methodology. If relevant, it might be worth investigating how their optimizations (e.g., Rust implementation, Metal backend tweaks) compare to llama.cpp’s current Apple Silicon support.
Feel free to close this if it’s not actionable, but I thought it worth flagging given the overlap in use cases. Cheers!
Beta Was this translation helpful? Give feedback.
All reactions