max-batch-size对显存占用的影响 #2479
Unanswered
idontlikelongname
asked this question in
Q&A
Replies: 1 comment 6 replies
-
不是线性增长的关系,和 batch size 无关。 |
Beta Was this translation helpful? Give feedback.
6 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
使用W4A16量化后的Meta-Llama-3.1-70B-Instruct,在H800-80G测试模型吞吐量,使用脚本benchmark/profile_throughput.py。concurrency分别等于16和128时,GPU显存占用并没有明显的区别。
请问为什么会产生这种现象,我理解显存占用量与bath size的大小近似线性增长,从实际表现上看好像并不是这样。
请问Autoregression类模型的显存占用和batch size之间是什么关系?
Beta Was this translation helpful? Give feedback.
All reactions