Espresso: direct ANE inference at 4.76x CoreML — potential complementary approach? #372

christopherkarani · 2026-03-16T21:46:34Z

christopherkarani
Mar 16, 2026

Hi MLX team,

I wanted to share Espresso (https://github.com/christopherkarani/Espresso) and get your technical perspective.

Espresso is an open-source (MIT), pure-Swift framework for Apple Silicon that accesses the ANE directly via the private MIL text dialect — bypassing CoreML's abstraction layer. On M3 Max we achieve:

4.76x faster inference than standard CoreML
519 tok/s for transformer inference with fused KV-cache decode
1.93 ms/token end-to-end

I see Espresso as potentially complementary to MLX rather than competitive — MLX provides an excellent general-purpose ML framework, while Espresso demonstrates the maximum possible performance ceiling for ANE inference on specific transformer architectures.

A few questions for the team:

Are there aspects of the MIL approach that could inform MLX's ANE dispatch path?
Is there interest in a community benchmark that compares MLX-swift vs. Espresso vs. CoreML for transformer inference?
Any technical feedback on our kernel fusion approach?

Happy to discuss further in this thread or via a GitHub issue.

— Chris
https://github.com/christopherkarani/Espresso

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Espresso: direct ANE inference at 4.76x CoreML — potential complementary approach? #372

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Espresso: direct ANE inference at 4.76x CoreML — potential complementary approach? #372

Uh oh!

christopherkarani Mar 16, 2026

Replies: 0 comments

christopherkarani
Mar 16, 2026