进行推理过程中无法吃满GPU #555
Closed
lostmaniac
started this conversation in
Bad Case
Replies: 1 comment 1 reply
-
两卡的效率本来就很低了,建议是单卡。另外,这是默认的hf加载方式,没有做任何推理优化,你可以使用vllm等推理优化框架尝试 |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
环境说明
./openai_api.py
cuda监控:
调整过的文件:openai_api.py
生成过程中使用率始终为40%-55%左右的使用率
请问需要怎么调整能够吃满GPU,让速度更快一点。感谢
Beta Was this translation helpful? Give feedback.
All reactions