whisper concurrent request in websockets #2195
Replies: 1 comment 2 replies
-
Usage of FastAPI to create web socketsFirst advice is to use FastAPI web sockets, it makes the concise, clear and also gives out less error. Bottleneck of web sockets and limitations of PythonNext is that there is a bottle neck to web sockets. If you sticking to Python then it would be difficult to scale up just with Python. Use Kubernetes on top of web socket implementation.Try to use Kubernetes or OpenShift to use multiple nodes, because the server won't be handling load of that amount of concurrent requests (I have no idea of which model you are using and which GPU you are using. My assumption is that you are using the large-v3 model and using A100 GPU). Better approachI find this problem statement has to move from Python to JavaScript as JavaScript is a non-blocking language. There you have the option of creating WebRTC, which will give you better option of customization. In this case you can handle multiple connections, just like a zoom or google meet application. Limitation of this approachThere is no technology that does not have a limitation. In case the technology has no limitations right now, after several years there would be a researcher who will come up with a flaw in the technology. Same with this. The better approach will solve your problem statement but won't suffice long. It could maximum take 100 or maximum 250 concurrent requests with the infrastructure and model specified in the assumptions. Hence Kubernetes on top of this would be a long lasting approach wherein the scalability happens automatically and you don't need to worry about the replicas. Also you can limit the number of nodes to be 1 in the initial case and then increase the nodes according to the traffic. This is not manual task there are yaml configurations you can write to automate the process. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
I want to build a system for my users so my mission is to give transcription to my users in real time , i have made that system but when many users like 50 connects at the same time it does not give transcription real time. Can anyone tell me what is the problem ? i am using assyncio to create tasks also.
Beta Was this translation helpful? Give feedback.
All reactions