MatMul-free Ternary (Sherry/Tequila) Support #3941

TomLucidor · 2026-01-28T04:13:10Z

TomLucidor
Jan 28, 2026

There are new research coming out that promised to be 4x speed boost (Falcon-Edge is QAT not PTQ), and I wonder if Unsloth can help out in the process of creating this https://github.com/Tencent/AngelSlim/tree/sherry/Sherry https://github.com/Tencent/AngelSlim/tree/tequila/TernaryQuant
Also asking this in major libraries for support ggml-org/llama.cpp#19123 vllm-project/vllm#33142

xXMrNidaXx · 2026-02-23T13:31:35Z

xXMrNidaXx
Feb 23, 2026

Ternary/MatMul-free support would be huge for edge deployment! At RevolutionAI (https://revolutionai.io) we are exploring these for client devices.

Our interest:

Mobile inference — ternary weights drastically reduce memory bandwidth
CPU-only deployment — some clients cannot use GPUs
Energy efficiency — critical for always-on assistants

Questions:

Accuracy tradeoff — how much quality loss vs full precision?
Training from scratch vs conversion — which approach do you recommend?
Hybrid architectures — can we keep attention layers in higher precision?

We have been following the Sherry paper closely. The 1.58-bit approach is fascinating — especially the claim that performance scales with model size even at extreme quantization.

Would love to test this on our edge inference stack once supported! 🔥

1 reply

TomLucidor Feb 23, 2026
Author

As long as you can FOSS the work to be CPU, GPU, and MLX compatible, please go ahead and share all the findings!

xXMrNidaXx · 2026-02-23T15:04:19Z

xXMrNidaXx
Feb 23, 2026

MatMul-free ternary models are exciting! At RevolutionAI (https://revolutionai.io) we explore efficient architectures.

Why this matters:

Massive memory reduction
Faster inference on CPU
Lower power consumption

Current landscape:

BitNet: 1-bit weights
Ternary: -1, 0, +1 weights
Sherry/Tequila: MatMul-free attention

Trade-offs:

Approach	Memory	Speed	Quality
FP16	100%	1x	Best
INT8	50%	1.5x	Good
Ternary	6%	3-5x	Decent

Use cases:

Edge devices
Mobile inference
High-throughput servers

Would love to see this in Unsloth!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

MatMul-free Ternary (Sherry/Tequila) Support #3941

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

MatMul-free Ternary (Sherry/Tequila) Support #3941

Uh oh!

TomLucidor Jan 28, 2026

Replies: 2 comments · 1 reply

Uh oh!

xXMrNidaXx Feb 23, 2026

Uh oh!

TomLucidor Feb 23, 2026 Author

Uh oh!

xXMrNidaXx Feb 23, 2026

TomLucidor
Jan 28, 2026

Replies: 2 comments 1 reply

xXMrNidaXx
Feb 23, 2026

TomLucidor Feb 23, 2026
Author

xXMrNidaXx
Feb 23, 2026