Replies: 1 comment
-
Is this possible in this python implementation, or not? it may depend on the implementation? Thanks. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi, I'm running a project trying to parallelize tasks with whisper, I'm using CUDA and loading the model in VRAM. Is there a way to parallelize tasks with a single large-v3 model loaded in memory (~12GB VRAM) instead of loading the model for each task I need concurrency? My method right now is using python's multiprocessing for loading the model twice in memory for having at least 2x, I've tried threading and using the same model for multiple transcriptions but an exception of tensor's weights is raised. Thanks in advance!
Beta Was this translation helpful? Give feedback.
All reactions