Replies: 1 comment 7 replies
-
1 instance per gpu 👌 |
Beta Was this translation helpful? Give feedback.
7 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hello everyone !
I am just starting with Whisper, sorry if this seems obvious and basic; was actually wondering how Whisper scales when used with concurrent calls; for instance, if I have a 16GB GPU and I test whisper's large model which uses ~10GB, am I likely to get an Out of memory error if I initiate a second Whisper command, this, if one instance has already been started and is still running?
Does the model persist in memory in between calls? If so, is it done by default?
On the other hand, if I add a second GPU, how does it scale? Is all the compute melted in one pot and becomes transparent to the way the model and Whisper are handled? Or else? How this all work out with Whisper's concurrent calling?
Thanks for any insights !
Beta Was this translation helpful? Give feedback.
All reactions