Subtitle generation web app using nextjs, whisper model and ffmpeg.wasm #2568
Replies: 1 comment
-
just sharing what I learned from the whisper model while working on my previous project. The model makes mistakes when processing long-duration videos, especially with long background music tracks. I believe this could be improved using audio diarization, but that’s a different space. For the web app I worked on, I used the model using a Hugging Face inference endpoint, but I also tried setting it up Whisper on my local machine. I attempted to translate a video from Hindi to English. When I loaded and translated long audio files using the large model (1550 M parameters), it took a significant amount of time. However, when I processed the audio in 3-minute chunks, it was much faster. The 3-minute audio chunks were quite accurate. I also tried translation along with word timestamps. This is a pretty interesting problem. Please share your views. I would love to learn from different perspectives! |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hey, I built web app over the weekend as my side project to simplify subtitle generation using the open-source Whisper model and ffmpeg.wasm. It transcribes spoken words into precise text, making videos more accessible and professional. One cool aspect of this project is that it uses ffmpeg webassembly, so all the processing happens in the client's browser without stressing the server.
Please check it out whenever you find some time and give a star to the repo if you like the project ⭐️
Github Repo: https://github.com/iyashjayesh/captune-ai
Beta Was this translation helpful? Give feedback.
All reactions