|
1 | | -# Whisper Web |
| 1 | +# RunAnywhere Voice Pipeline |
2 | 2 |
|
3 | | -Whisper-web is a webapplication that allows you to transcribe sound files to text completely locally in your web browser. |
| 3 | +Ultra-low latency voice AI conversations in your browser - ~50% faster than ElevenLabs. |
4 | 4 |
|
5 | | - |
| 5 | + |
6 | 6 |
|
7 | | -This repository is a fork of [Xenova/whisper-web](https://github.com/xenova/whisper-web). |
| 7 | +## About RunAnywhere |
8 | 8 |
|
9 | | -Here are the main differences: |
| 9 | +[RunAnywhere](https://www.runanywhere.ai/) is building the future of local AI inference, making powerful AI models run efficiently on any device. This voice pipeline demonstrates our commitment to high-performance, privacy-preserving AI that runs directly in your browser. |
10 | 10 |
|
11 | | -- Actively maintained |
12 | | -- Up-to-date dependencies, including transformers.js |
13 | | -- Ability to use WebGPU or CPU |
14 | | -- More user-friendly interface |
15 | | -- User interface in several languages |
16 | | -- Available as a progressive web app (so usable offline if added to your homescreen) |
17 | | -- Transcription is rendered continuously and not at the end |
18 | | -- Export to SRT |
19 | | -- Choose between a larger range of models (for example Swedish and Norwegian finetunes from the countries' national libraries) |
20 | | -- Choose your own quantization level for the model |
21 | | -- Clear cache with a button |
| 11 | +Check out our [SDKs and tools](https://github.com/RunanywhereAI/runanywhere-sdks) for more ways to run AI locally. |
22 | 12 |
|
23 | | -The main application is available at [whisper-web.mesu.re](https://whisper-web.mesu.re). It is hosted on [statichost.eu](https://statichost.eu). |
| 13 | +## What We Built |
24 | 14 |
|
25 | | -## KB-Whisper |
| 15 | +A complete end-to-end voice AI pipeline with ultra-fast response times: |
| 16 | +- **Moonshine STT** → **OpenAI LLM** → **Kokoro TTS** |
| 17 | +- Fully local speech recognition and synthesis |
| 18 | +- ~50% faster than ElevenLabs cloud pipeline |
| 19 | +- Side-by-side comparison mode for benchmarking |
| 20 | +- Comprehensive performance metrics at every stage |
26 | 21 |
|
27 | | -Initially, this project aimed at making the [Swedish KB-Whisper models](https://huggingface.co/collections/KBLab/kb-whisper-67af9eafb24da903b63cc4aa) fine-tuned by the [Swedish National library](https://www.kb.se/samverkan-och-utveckling/nytt-fran-kb/nyheter-samverkan-och-utveckling/2025-02-20-valtranad-ai-modell-forvandlar-tal-till-text.html) ♥️ more available for easy transcription of Swedish audio. |
| 22 | +## Key Features |
28 | 23 |
|
29 | | -A version of the website with Swedish as default language is still available at [kb-whisper.mesu.re](https://kb-whisper.mesu.re) (hosted in the EU by [statichost.eu](https://statichost.eu)) and the source code is on the [swedish branch](https://github.com/PierreMesure/whisper-web/tree/swedish) but it is identical to the other version at [whisper-web.mesu.re](https://whisper-web.mesu.re). |
| 24 | +- Ultra-low latency conversational AI |
| 25 | +- WebGPU/WebAssembly acceleration |
| 26 | +- Voice Activity Detection with echo prevention |
| 27 | +- 16 voice options (Kokoro) or native browser TTS |
| 28 | +- Progressive Web App (works offline) |
| 29 | + |
| 30 | +## Credits |
| 31 | + |
| 32 | +Built on top of amazing open source projects: |
| 33 | +- Original [Whisper Web](https://github.com/xenova/whisper-web) by Xenova |
| 34 | +- Enhanced fork by [Pierre Mesure](https://github.com/PierreMesure/whisper-web) |
| 35 | +- [Transformers.js](https://github.com/xenova/transformers.js) for WebAssembly ML |
| 36 | +- [Moonshine STT](https://github.com/usefulsensors/moonshine) models |
| 37 | +- [Kokoro TTS](https://huggingface.co/hexgrad/Kokoro-82M) by hexgrad |
| 38 | + |
| 39 | +For technical details, see [Voice Pipeline Architecture](./docs/VOICE_PIPELINE_ARCHITECTURE.md). |
30 | 40 |
|
31 | 41 | ## Running locally |
32 | 42 |
|
|
0 commit comments