Parallel inference with whisper #2424
Unanswered
SaidiSouhaieb
asked this question in
Q&A
Replies: 1 comment
-
@SaidiSouhaieb did you found any solution? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hello, i am trying to deploy whisper in production server in a websocket. However my problem is that each payload is being transcribed sequentially, i want to make it so whisper utilizes more of my gpu (NVIDIA RTX 4000 Ada Generation) and run inference on multiple requests in parallel, after trying to use multithreading(worked on cpu), and switched to multiple libraries but nothing worked.
Any help would be appreciated
p.s: if it takes multiple gpus can someone guide me though that path
Thank you
Beta Was this translation helpful? Give feedback.
All reactions