whisper concurrent request in websockets #2195

saboorniazi · 2024-05-31T13:26:21Z

saboorniazi
May 31, 2024

I want to build a system for my users so my mission is to give transcription to my users in real time , i have made that system but when many users like 50 connects at the same time it does not give transcription real time. Can anyone tell me what is the problem ? i am using assyncio to create tasks also.

alvynabranches · 2024-06-01T15:30:12Z

alvynabranches
Jun 1, 2024

Usage of FastAPI to create web sockets

First advice is to use FastAPI web sockets, it makes the concise, clear and also gives out less error.

Bottleneck of web sockets and limitations of Python

Next is that there is a bottle neck to web sockets. If you sticking to Python then it would be difficult to scale up just with Python.

Use Kubernetes on top of web socket implementation.

Try to use Kubernetes or OpenShift to use multiple nodes, because the server won't be handling load of that amount of concurrent requests (I have no idea of which model you are using and which GPU you are using. My assumption is that you are using the large-v3 model and using A100 GPU).

Better approach

I find this problem statement has to move from Python to JavaScript as JavaScript is a non-blocking language. There you have the option of creating WebRTC, which will give you better option of customization. In this case you can handle multiple connections, just like a zoom or google meet application.

Limitation of this approach

There is no technology that does not have a limitation. In case the technology has no limitations right now, after several years there would be a researcher who will come up with a flaw in the technology. Same with this.

The better approach will solve your problem statement but won't suffice long. It could maximum take 100 or maximum 250 concurrent requests with the infrastructure and model specified in the assumptions. Hence Kubernetes on top of this would be a long lasting approach wherein the scalability happens automatically and you don't need to worry about the replicas. Also you can limit the number of nodes to be 1 in the initial case and then increase the nodes according to the traffic. This is not manual task there are yaml configurations you can write to automate the process.

2 replies

alvynabranches Jun 1, 2024

Kindly write down your comments, whether the suggestion is good or bad and where it needs to be changed.

saboorniazi Jun 3, 2024
Author

i used fastapi and websockets for this system but the issue i am getting in this is if multiple clients lets say 50 connects at the same time and i have one whisper model loaded globally so when 50 clients access that models at the same time it gives error and when i use threads and apply thread locks that error will be removed . but i want to handle and give real time transcription to the user.
I am running this code on two RTX 3090 GPUs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

whisper concurrent request in websockets #2195

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

whisper concurrent request in websockets #2195

Uh oh!

saboorniazi May 31, 2024

Replies: 1 comment · 2 replies

Uh oh!

alvynabranches Jun 1, 2024

Usage of FastAPI to create web sockets

Bottleneck of web sockets and limitations of Python

Use Kubernetes on top of web socket implementation.

Better approach

Limitation of this approach

Uh oh!

alvynabranches Jun 1, 2024

Uh oh!

saboorniazi Jun 3, 2024 Author

saboorniazi
May 31, 2024

Replies: 1 comment 2 replies

alvynabranches
Jun 1, 2024

saboorniazi Jun 3, 2024
Author