|
11 | 11 |
|
12 | 12 | OpenReader WebUI is an open source text to speech document reader web app built using Next.js, offering a TTS read along experience with narration for **EPUB, PDF, TXT, MD, and DOCX documents**. It supports multiple TTS providers including OpenAI, Deepinfra, and custom OpenAI-compatible endpoints like [Kokoro-FastAPI](https://github.com/remsky/Kokoro-FastAPI) and [Orpheus-FastAPI](https://github.com/Lex-au/Orpheus-FastAPI) |
13 | 13 |
|
14 | | -- 🧠 *(New)* **Smart Sentence-Aware Narration** merges sentences across pages/chapters for smoother TTS |
15 | | -- 🎧 *(New)* **Reliable Audiobook Export** in **m4b/mp3**, with resumable, chapter-based export and regeneration |
16 | 14 | - 🎯 *(New)* **Multi-Provider TTS Support** |
17 | 15 | - [**Kokoro-FastAPI**](https://github.com/remsky/Kokoro-FastAPI): Supporting multi-voice combinations (like `af_heart+af_bella`) |
18 | 16 | - [**Orpheus-FastAPI**](https://github.com/Lex-au/Orpheus-FastAPI) |
19 | 17 | - **Custom OpenAI-compatible**: Any TTS API with `/v1/audio/voices` and `/v1/audio/speech` endpoints |
20 | 18 | - **Cloud TTS Providers (requiring API keys)** |
21 | 19 | - [**Deepinfra**](https://deepinfra.com/models/text-to-speech): Kokoro-82M + models with support for cloned voices and more |
22 | 20 | - [**OpenAI API ($$)**](https://platform.openai.com/docs/pricing#transcription-and-speech): tts-1, tts-1-hd, and gpt-4o-mini-tts w/ instructions |
23 | | -- 🚀 *(New)* **Optimized Next.js TTS Proxy** with audio caching and optimized repeat playback |
24 | | -- 💾 *(Updated)* **Local-First Architecture** stores documents and more in-browser with Dexie.js |
25 | 21 | - 📖 *(Updated)* **Read Along Experience** providing real-time text highlighting during playback (PDF/EPUB) |
| 22 | + - *(New)* **Word-by-word** highlighting uses word-by-word timestamps generated server-side with [*whisper.cpp*](https://github.com/ggml-org/whisper.cpp) (optional) |
| 23 | +- 🧠 *(New)* **Smart Sentence-Aware Narration** merges sentences across pages/chapters for smoother TTS |
| 24 | +- 🎧 *(New)* **Reliable Audiobook Export** in **m4b/mp3**, with resumable, chapter-based export and regeneration |
| 25 | +- 🚀 *(New)* **Optimized Next.js TTS Proxy** with audio caching and optimized repeat playback |
| 26 | +- 💾 **Local-First Architecture** stores documents and more in-browser with Dexie.js |
26 | 27 | - 🛜 **Optional Server-side documents** using backend `/docstore` for all users |
27 | 28 | - 🎨 **Customizable Experience** |
28 | 29 | - 🎨 Multiple app theme options |
29 | 30 | - ⚙️ Various TTS and document handling settings |
30 | 31 | - And more ... |
31 | 32 |
|
32 | | -<details> |
33 | | -<summary> |
34 | | - |
35 | | -### 🆕 What's New in v1.0.0 |
36 | | - |
37 | | -</summary> |
38 | | - |
39 | | -- 🧠 **Smart sentence continuation** |
40 | | - - Improved NLP handling of complex structures and quoted dialogue provides more natural sentence boundaries and a smoother audio-text flow. |
41 | | - - EPUB and PDF playback now use smarter sentence splitting and continuation metadata so sentences that cross page/chapter boundaries are merged before hitting the TTS API. |
42 | | - - This yields more natural narration and fewer awkward pauses when a sentence spans multiple pages or EPUB spine items. |
43 | | -- 📄 **Modernized PDF text highlighting pipeline** |
44 | | - - Real-time PDF text highlighting is now offloaded to a dedicated Web Worker so scrolling and playback controls remain responsive during narration. |
45 | | - - A new overlay-based highlighting system draws independent highlight layers on top of the PDF, avoiding interference with the underlying text layer. |
46 | | - - Upgraded fuzzy matching with Dice-based similarity improves the accuracy of mapping spoken words to on-screen text. |
47 | | - - A new per-device setting lets you enable or disable real-time PDF highlighting during playback for a more tailored reading experience. |
48 | | -- 🎧 **Chapter/page-based audiobook export with resume & regeneration** |
49 | | - - Per-chapter/per-page generation to disk with persistent `bookId` |
50 | | - - Resumable generation (can cancel and continue later) |
51 | | - - Per-chapter regeneration & deletion |
52 | | - - Final combined **M4B** or **MP3** download with embedded chapter metadata. |
53 | | -- 💾 **Dexie-backed local storage & sync** |
54 | | - - All document types (PDF, EPUB, TXT/MD-as-HTML) and config are stored via a unified Dexie layer on top of IndexedDB. |
55 | | - - Document lists use live Dexie queries (no manual refresh needed), and server sync now correctly includes text/markdown documents as part of the library backup. |
56 | | -- 🗣️ **Kokoro multi-voice selection & utilities** |
57 | | - - Kokoro models now support multi-voice combination, with provider-aware limits and helpers (not supported on OpenAI or Deepinfra) |
58 | | -- ⚡ **Faster, more efficient TTS backend proxy** |
59 | | - - In-memory **LRU caching** for audio responses with configurable size/TTL |
60 | | - - **ETag** support (`304` on cache hits) + `X-Cache` headers (`HIT` / `MISS` / `INFLIGHT`) |
61 | | -- 📄 **More robust DOCX → PDF conversion** |
62 | | - - DOCX conversion now uses isolated per-job LibreOffice profiles and temp directories, polls for a stable output file size, and aggressively cleans up temp files. |
63 | | - - This reduces cross-job interference and flakiness when converting multiple DOCX files in parallel. |
64 | | -- ♿ **Accessibility & layout improvements** |
65 | | - - Dialogs and folder toggles expose proper roles and ARIA attributes. |
66 | | - - PDF/EPUB/HTML readers use a full-height app shell with a sticky bottom TTS bar, improved scrollbars, and refined focus styles. |
67 | | -- ✅ **End-to-end Playwright test suite with TTS mocks** |
68 | | - - Deterministic TTS responses in tests via a reusable Playwright route mock. |
69 | | - - Coverage for accessibility, upload, navigation, folder management, deletion flows, audiobook generation/export and playback across all document types. |
70 | | - |
71 | | -</details> |
72 | | - |
73 | 33 | ## 🐳 Docker Quick Start |
74 | 34 |
|
75 | 35 | ### Prerequisites |
@@ -194,6 +154,20 @@ Optionally required for different features: |
194 | 154 | ```bash |
195 | 155 | brew install libreoffice |
196 | 156 | ``` |
| 157 | +- [whisper.cpp](https://github.com/ggml-org/whisper.cpp) (optional, required for word-by-word highlighting) |
| 158 | + ```bash |
| 159 | + # clone and build whisper.cpp (no model download needed – OpenReader handles that) |
| 160 | + git clone https://github.com/ggml-org/whisper.cpp.git |
| 161 | + cd whisper.cpp |
| 162 | + cmake -B build |
| 163 | + cmake --build build -j --config Release |
| 164 | +
|
| 165 | + # point OpenReader to the compiled whisper-cli binary |
| 166 | + echo WHISPER_CPP_BIN=\"$(pwd)/build/bin/whisper-cli\" |
| 167 | + ``` |
| 168 | + |
| 169 | + > **Note:** The `WHISPER_CPP_BIN` path should be set in your `.env` file for OpenReader to use word-by-word highlighting features. |
| 170 | + |
197 | 171 | ### Steps |
198 | 172 |
|
199 | 173 | 1. Clone the repository: |
|
0 commit comments