开放选题:InfinILM 重构反量化推理 #3
YdrMaster
announced in
Announcements
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
InfiniLM 目前在 llama-cpu 中支持反量化推理。即加载 gguf 量化模型,使用每个参数矩阵前对参数反量化再参与计算。目前反量化的实现不够通用,因此需要重构。
llama 推理中只有线性层参数需要量化,因此重构方案是修改算子库-矩阵乘。将反量化矩阵乘视作混合精度矩阵乘,使用 workspace 存储反量化参数。矩阵乘的 3 个参数中,允许 a 和 b 是任意的量化类型,反量化到 c 的数据类型进行计算。
Beta Was this translation helpful? Give feedback.
All reactions