Parallel inference with whisper #2424

SaidiSouhaieb · 2024-11-07T08:13:11Z

SaidiSouhaieb
Nov 7, 2024

Hello, i am trying to deploy whisper in production server in a websocket. However my problem is that each payload is being transcribed sequentially, i want to make it so whisper utilizes more of my gpu (NVIDIA RTX 4000 Ada Generation) and run inference on multiple requests in parallel, after trying to use multithreading(worked on cpu), and switched to multiple libraries but nothing worked.

Any help would be appreciated

p.s: if it takes multiple gpus can someone guide me though that path
Thank you

Navanit-git · 2025-07-08T04:54:14Z

Navanit-git
Jul 8, 2025

@SaidiSouhaieb did you found any solution?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Parallel inference with whisper #2424

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Parallel inference with whisper #2424

Uh oh!

SaidiSouhaieb Nov 7, 2024

Replies: 1 comment

Uh oh!

Navanit-git Jul 8, 2025

SaidiSouhaieb
Nov 7, 2024

Navanit-git
Jul 8, 2025