Replies: 2 comments 3 replies
-
@irexyc please do the favor |
Beta Was this translation helpful? Give feedback.
0 replies
-
You could do this way:
|
Beta Was this translation helpful? Give feedback.
3 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi, I am getting following error:
torch.cuda.OutOfMemoryError: CUDA out of memory
while deploying 4 bit quantized llama2 70b model using following command:
python3 -m lmdeploy.serve.turbomind.deploy llama2 llama2-chat-70b-w4 --model-format awq --group-size 128 --tp 4
I am using 4 x NVIDIA A10G (24GB VRAM each) configuration and this deployment command is using only one GPU out of 4 and getting out of memory. Is there a way to use all 4 GPUs for deploy command?
Beta Was this translation helpful? Give feedback.
All reactions