Custom tensor parallelism #5438
Unanswered
sarathkondeti
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hello,
I'm deploying Vicuna-33b over 2xA6000s(48gb each) using TP 2.
It consumes about ~37gb on each card.
I have a new scenario where there is a workload on gpu0 consuming 24gb vram.
Does Deepspeed currently do anything smart like a 25:75 tensor slicing to adapt to the 24+48gb gpu mem?
I see that tp_shard.py doesn't support any parameters but I wanted to confirm this behavior.
Thank you very much.
Beta Was this translation helpful? Give feedback.
All reactions