You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
OpenReader WebUI is a document reader with Text-to-Speech capabilities, offering a TTS read along experience with narration for EPUB, PDF, TXT, MD, and DOCX documents. It supports multiple TTS providers including OpenAI, Deepinfra, and custom OpenAI-compatible endpoints like [Kokoro-FastAPI](https://github.com/remsky/Kokoro-FastAPI) and [Orpheus-FastAPI](https://github.com/Lex-au/Orpheus-FastAPI)
12
+
OpenReader WebUI is an open source text to speech document reader web app built using Next.js, offering a TTS read along experience with narration for EPUB, PDF, TXT, MD, and DOCX documents. It supports multiple TTS providers including OpenAI, Deepinfra, and custom OpenAI-compatible endpoints like [Kokoro-FastAPI](https://github.com/remsky/Kokoro-FastAPI) and [Orpheus-FastAPI](https://github.com/Lex-au/Orpheus-FastAPI)
- 🧠 **(New) Smart Sentence-Aware Narration**: EPUB and PDF playback use shared NLP (compromise) and smart sentence continuation to merge sentences that span pages/chapters for smoother TTS trying to prevent hard cuts at page breaks
15
+
- 🎧 **(New) Reliable Audiobook Export**: Create and export audiobooks from PDF and EPUB files **(in m4b or mp3 format using ffmpeg)** with resumable, chapter/page-based export and per-chapter regeneration
16
+
- 🎯 **(New) Multi-Provider TTS Support**:
16
17
-**Deepinfra**: Kokoro-82M, Orpheus-3B, Sesame-1B models with extensive voice libraries
17
-
-**Custom OpenAI-Compatible**: Any OpenAI-compatible endpoint with custom voice sets
18
-
- 💾 **Local-First Architecture**: Uses IndexedDB browser storage for documents
19
-
- 🛜 **Optional Server-side documents**: Manually upload documents to the next backend for all users to download
20
-
- 📖 **Read Along Experience**: Follow along with highlighted text as the TTS narrates
- 🎧 **Audiobook Creation**: Create and export audiobooks from PDF and ePub files **(in m4b format with ffmpeg and aac TTS output)**
18
+
-**OpenAI API ($$)**: tts-1, tts-1-hd, gpt-4o-mini-tts models
19
+
-**Kokoro-FastAPI**: Self-hosted OpenAI-compatible TTS API server supporting Kokoro-82M and multi-voice combinations (like `af_heart+bf_emma`)
20
+
-**Orpheus-FastAPI**: Self-hosted OpenAI-compatible TTS API server supporting Orpheus-3B
21
+
- And other Custom OpenAI-compatible endpoints with a `/v1/audio/voices` endpoint
22
+
- 🚀 **(New) Optimized TTS Pipeline**: Next.js TTS backend with in-memory LRU audio cache, ETag-aware responses, and in-flight request de-duplication for faster repeat playback
23
+
- 💾 **Local-First Architecture**: IndexedDB browser storage for documents and settings (now using Dexie.js)
24
+
- 🛜 **Optional Server-side documents**: Manually upload documents to the Next.js backend (and Docker `docstore`) for all users to download
25
+
- 📖 **Read Along Experience**: Follow along with real-time highlighted text as the TTS narrates PDF files, using an overlay-based highlighter, per-sentence navigation, and skip controls
26
+
- 📄 **Document formats**: EPUB, PDF, TXT, MD, DOCX (with libreoffice installed, plus hardened DOCX→PDF conversion for better reliability)
23
27
- 🎨 **Customizable Experience**:
24
28
- 🔑 Select TTS provider (OpenAI, Deepinfra, or Custom OpenAI-compatible)
- Improved NLP handling of complex structures and quoted dialogue provides more natural sentence boundaries and a smoother audio-text flow.
42
+
- EPUB and PDF playback now use smarter sentence splitting and continuation metadata so sentences that cross page/chapter boundaries are merged before hitting the TTS API.
43
+
- This yields more natural narration and fewer awkward pauses when a sentence spans multiple pages or EPUB spine items.
44
+
- 📄 **Modernized PDF text highlighting pipeline**
45
+
- Real-time PDF text highlighting is now offloaded to a dedicated Web Worker so scrolling and playback controls remain responsive during narration.
46
+
- A new overlay-based highlighting system draws independent highlight layers on top of the PDF, avoiding interference with the underlying text layer.
47
+
- Upgraded fuzzy matching with Dice-based similarity improves the accuracy of mapping spoken words to on-screen text.
48
+
- A new per-device setting lets you enable or disable real-time PDF highlighting during playback for a more tailored reading experience.
49
+
- 🎧 **Chapter/page-based audiobook export with resume & regeneration**
50
+
- Per-chapter/per-page generation to disk with persistent `bookId`
51
+
- Resumable generation (can cancel and continue later)
52
+
- Per-chapter regeneration & deletion
53
+
- Final combined **M4B** or **MP3** download with embedded chapter metadata.
54
+
- 💾 **Dexie-backed local storage & sync**
55
+
- All document types (PDF, EPUB, TXT/MD-as-HTML) and config are stored via a unified Dexie layer on top of IndexedDB.
56
+
- Document lists use live Dexie queries (no manual refresh needed), and server sync now correctly includes text/markdown documents as part of the library backup.
57
+
- 🗣️ **Kokoro multi-voice selection & utilities**
58
+
- Kokoro models now support multi-voice combination, with provider-aware limits and helpers (not supported on OpenAI or Deepinfra)
59
+
- ⚡ **Faster, more efficient TTS backend proxy**
60
+
- In-memory **LRU caching** for audio responses with configurable size/TTL
61
+
-**ETag** support (`304` on cache hits) + `X-Cache` headers (`HIT` / `MISS` / `INFLIGHT`)
62
+
- 📄 **More robust DOCX → PDF conversion**
63
+
- DOCX conversion now uses isolated per-job LibreOffice profiles and temp directories, polls for a stable output file size, and aggressively cleans up temp files.
64
+
- This reduces cross-job interference and flakiness when converting multiple DOCX files in parallel.
65
+
- ♿ **Accessibility & layout improvements**
66
+
- Dialogs and folder toggles expose proper roles and ARIA attributes.
67
+
- PDF/EPUB/HTML readers use a full-height app shell with a sticky bottom TTS bar, improved scrollbars, and refined focus styles.
68
+
- ✅ **End-to-end Playwright test suite with TTS mocks**
69
+
- Deterministic TTS responses in tests via a reusable Playwright route mock.
70
+
- Coverage for accessibility, upload, navigation, folder management, deletion flows, audiobook generation/export and playback across all document types.
71
+
72
+
</details>
32
73
33
74
## 🐳 Docker Quick Start
34
75
35
76
### Prerequisites
36
77
- Recent version of Docker installed on your machine
37
78
- A TTS API server (Kokoro-FastAPI, Orpheus-FastAPI, Deepinfra, OpenAI, etc.) running and accessible
38
79
80
+
> **Note:** If you have good hardware, you can run [Kokoro-FastAPI with Docker locally](#🗣️-local-kokoro-fastapi-quick-start-cpu-or-gpu) (see below).
81
+
39
82
### 1. 🐳 Start the Docker container:
40
83
```bash
41
84
docker run --name openreader-webui \
85
+
--restart unless-stopped \
42
86
-p 3003:3003 \
43
87
-v openreader_docstore:/app/docstore \
44
88
ghcr.io/richardr1126/openreader-webui:latest
@@ -47,6 +91,7 @@ OpenReader WebUI is a document reader with Text-to-Speech capabilities, offering
47
91
(Optionally): Set the TTS `API_BASE` URL and/or `API_KEY` to be default for all devices
### (Alternate) 🐳 Configuration with Docker Compose and Kokoro-FastAPI
120
+
### 🗣️ Local Kokoro-FastAPI Quick-start (CPU or GPU)
76
121
77
-
A complete example docker-compose file with Kokoro-FastAPI and OpenReader WebUI is available in [`docs/examples/docker-compose.yml`](docs/examples/docker-compose.yml). You can download and use it:
122
+
You can run the Kokoro TTS API server directly with Docker. **We are not responsible for issues with [Kokoro-FastAPI](https://github.com/remsky/Kokoro-FastAPI).** For best performance, use an NVIDIA GPU (for GPU version) or Apple Silicon (for CPU version).
> - These commands are for running the Kokoro TTS API server only. For issues or support, see the [Kokoro-FastAPI repository](https://github.com/remsky/Kokoro-FastAPI).
175
+
> - The GPU version requires NVIDIA Docker support and works best with NVIDIA GPUs. The CPU version works best on Apple Silicon or modern x86 CPUs.
176
+
> - Adjust environment variables as needed for your hardware and use case.
177
+
178
+
## Local Development Installation
106
179
107
180
### Prerequisites
108
-
- Node.js & npm or pnpm (recommended: use [nvm](https://github.com/nvm-sh/nvm) for Node.js)
181
+
- Node.js (recommended: use [nvm](https://github.com/nvm-sh/nvm))
182
+
- pnpm (recommended) or npm
183
+
```bash
184
+
npm install -g pnpm
185
+
```
186
+
- A TTS API server (Kokoro-FastAPI, Orpheus-FastAPI, Deepinfra, OpenAI, etc.) running and accessible
109
187
Optionally required for different features:
110
188
- [FFmpeg](https://ffmpeg.org) (required for audiobook m4b creation only)
111
-
- On Linux: `sudo apt install ffmpeg`
112
-
- On MacOS: `brew install ffmpeg`
189
+
```bash
190
+
brew install ffmpeg
191
+
```
113
192
- [libreoffice](https://www.libreoffice.org) (required for DOCX files)
114
-
- On Linux: `sudo apt install libreoffice`
115
-
- On MacOS: `brew install libreoffice`
116
-
193
+
```bash
194
+
brew install libreoffice
195
+
```
117
196
### Steps
118
197
119
198
1. Clone the repository:
@@ -126,12 +205,7 @@ Optionally required for different features:
126
205
127
206
With pnpm (recommended):
128
207
```bash
129
-
pnpm install
130
-
```
131
-
132
-
Or with npm:
133
-
```bash
134
-
npm install
208
+
pnpm i # or npm i
135
209
```
136
210
137
211
3. Configure the environment:
@@ -145,26 +219,15 @@ Optionally required for different features:
145
219
146
220
With pnpm (recommended):
147
221
```bash
148
-
pnpm dev
149
-
```
150
-
151
-
Or with npm:
152
-
```bash
153
-
npm run dev
222
+
pnpm dev # or npm run dev
154
223
```
155
224
156
225
or build and run the production server:
157
226
158
227
With pnpm:
159
228
```bash
160
-
pnpm build
161
-
pnpm start
162
-
```
163
-
164
-
Or with npm:
165
-
```bash
166
-
npm run build
167
-
npm start
229
+
pnpm build # or npm run build
230
+
pnpm start # or npm start
168
231
```
169
232
170
233
Visit [http://localhost:3003](http://localhost:3003) to run the app.
@@ -201,7 +264,7 @@ This project would not be possible without standing on the shoulders of these gi
201
264
202
265
- **Framework:** Next.js (React)
203
266
- **Containerization:** Docker
204
-
- **Storage:** IndexedDB (inbrowser db store)
267
+
- **Storage:**Dexie + IndexedDB (in-browser local database)
0 commit comments