MatMul-free Ternary (Sherry/Tequila) Support #3941
Replies: 2 comments 1 reply
-
|
Ternary/MatMul-free support would be huge for edge deployment! At RevolutionAI (https://revolutionai.io) we are exploring these for client devices. Our interest:
Questions:
We have been following the Sherry paper closely. The 1.58-bit approach is fascinating — especially the claim that performance scales with model size even at extreme quantization. Would love to test this on our edge inference stack once supported! 🔥 |
Beta Was this translation helpful? Give feedback.
-
|
MatMul-free ternary models are exciting! At RevolutionAI (https://revolutionai.io) we explore efficient architectures. Why this matters:
Current landscape:
Trade-offs:
Use cases:
Would love to see this in Unsloth! |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
There are new research coming out that promised to be 4x speed boost (Falcon-Edge is QAT not PTQ), and I wonder if Unsloth can help out in the process of creating this https://github.com/Tencent/AngelSlim/tree/sherry/Sherry https://github.com/Tencent/AngelSlim/tree/tequila/TernaryQuant
Also asking this in major libraries for support ggml-org/llama.cpp#19123 vllm-project/vllm#33142
Beta Was this translation helpful? Give feedback.
All reactions