关于量化模型加速推理 #2602
Unanswered
zhuchen1109
asked this question in
Q&A
关于量化模型加速推理
#2602
Replies: 1 comment
-
可以用下turbomind |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
我使用internvl2-26b int4模型做推理,使用的是pytorch版本,发现vllm的awq_marlin会更快。
想请问,在lmdeploy里设置哪些可以加速推理呢?turbomimd在量化模型上会更好吗?
Beta Was this translation helpful? Give feedback.
All reactions