-
Notifications
You must be signed in to change notification settings - Fork 13.3k
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Prerequisites
- I am running the latest code. Mention the version if possible as well.
- I carefully followed the README.md.
- I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
- I reviewed the Discussions, and have a new and useful enhancement to share.
Feature Description
huawei SINQ quant method might be low hanging fruit quant to support in llama.cpp:
https://github.com/huawei-csl/SINQ/blob/main/README.md#3-quantize-any-llm-with-sinq
Motivation
`⚡️ A fast, plug-and-play, model-agnostic quantization technique delivering state-of-the-art performance for Large Language Models without sacrificing accuracy.
💡 Want to run a large model on your GPU but don’t have enough memory? With SINQ, you can deploy models that would otherwise be too big drastically reducing memory usage while preserving LLM quality.
⏱️ SINQ quantizes Qwen3-14B in just ~21 sec and DeepSeekV2.5-236B in ~5 min`
Possible Implementation
https://github.com/huawei-csl/SINQ/blob/main/README.md#3-quantize-any-llm-with-sinq apache 2.0 copy a lot of stuff from here
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request