You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
refactor(db): migrate to Dexie with reactive queries and simplify data layer
Replaces custom IndexedDB implementation with Dexie ORM, eliminating 850+ lines of
boilerplate code and introducing reactive live queries across all document types.
Transforms document management from imperative refresh patterns to automatic
reactive updates using dexie-react-hooks.
Simplifies TTS backend by removing concurrency semaphore while maintaining
request de-duplication through in-flight tracking. Streamlines document hooks
by removing manual state management and refresh methods.
Updates package dependencies and type definitions to support new database
architecture while maintaining full backward compatibility for existing
documents and settings.
BREAKING CHANGE: Document hooks no longer expose refresh() methods as updates
are now reactive through live queries.
OpenReader WebUI is a document reader with Text-to-Speech capabilities, offering a TTS read along experience with narration for EPUB, PDF, TXT, MD, and DOCX documents. It supports multiple TTS providers including OpenAI, Deepinfra, and custom OpenAI-compatible endpoints like [Kokoro-FastAPI](https://github.com/remsky/Kokoro-FastAPI) and [Orpheus-FastAPI](https://github.com/Lex-au/Orpheus-FastAPI)
12
+
OpenReader WebUI is an open source text to speech document reader web app built using Next.js, offering a TTS read along experience with narration for EPUB, PDF, TXT, MD, and DOCX documents. It supports multiple TTS providers including OpenAI, Deepinfra, and custom OpenAI-compatible endpoints like [Kokoro-FastAPI](https://github.com/remsky/Kokoro-FastAPI) and [Orpheus-FastAPI](https://github.com/Lex-au/Orpheus-FastAPI)
- 🧠 **(New) Smart Sentence-Aware Narration**: EPUB and PDF playback use shared NLP (compromise) and smart sentence continuation to merge sentences that span pages/chapters for smoother TTS trying to prevent hard cuts at page breaks
15
+
- 🎧 **(New) Reliable Audiobook Export**: Create and export audiobooks from PDF and EPUB files **(in m4b or mp3 format using ffmpeg)** with resumable, chapter/page-based export and per-chapter regeneration
16
+
- 🎯 **(New) Multi-Provider TTS Support**:
16
17
-**Deepinfra**: Kokoro-82M, Orpheus-3B, Sesame-1B models with extensive voice libraries
17
-
-**Custom OpenAI-Compatible**: Any OpenAI-compatible endpoint with custom voice sets
18
-
- 💾 **Local-First Architecture**: Uses IndexedDB browser storage for documents
19
-
- 🛜 **Optional Server-side documents**: Manually upload documents to the next backend for all users to download
20
-
- 📖 **Read Along Experience**: Follow along with highlighted text as the TTS narrates
- EPUB and PDF playback now use smarter sentence splitting and continuation metadata so sentences that cross page/chapter boundaries are merged before hitting the TTS API.
42
+
- This yields more natural narration and fewer awkward pauses when a sentence spans multiple pages or EPUB spine items
43
+
- 🎧 **Chapter/page-based audiobook export with resume & regeneration**
44
+
- Per-chapter/per-page generation to disk with persistent `bookId`
45
+
- Resumable generation (can cancel and continue later)
46
+
- Per-chapter regeneration & deletion
47
+
- Final combined **M4B** or **MP3** download with embedded chapter metadata.
48
+
- 💾 **Dexie-backed local storage & sync**
49
+
- All document types (PDF, EPUB, TXT/MD-as-HTML) and config are stored via a unified Dexie layer on top of IndexedDB.
50
+
- Document lists use live Dexie queries (no manual refresh needed), and server sync now correctly includes text/markdown documents as part of the library backup.
51
+
- 🗣️ **Kokoro multi-voice selection & utilities**
52
+
- Kokoro models now support multi-voice combination, with provider-aware limits and helpers (not supported on OpenAI or Deepinfra)
53
+
- ⚡ **Faster, more efficient TTS backend proxy**
54
+
- In-memory **LRU caching** for audio responses with configurable size/TTL
55
+
-**ETag** support (`304` on cache hits) + `X-Cache` headers (`HIT` / `MISS` / `INFLIGHT`)
56
+
- 📄 **More robust DOCX → PDF conversion**
57
+
- DOCX conversion now uses isolated per-job LibreOffice profiles and temp directories, polls for a stable output file size, and aggressively cleans up temp files.
58
+
- This reduces cross-job interference and flakiness when converting multiple DOCX files in parallel.
59
+
- ♿ **Accessibility & layout improvements**
60
+
- Dialogs and folder toggles expose proper roles and ARIA attributes.
61
+
- PDF/EPUB/HTML readers use a full-height app shell with a sticky bottom TTS bar, improved scrollbars, and refined focus styles.
62
+
- ✅ **End-to-end Playwright test suite with TTS mocks**
63
+
- Deterministic TTS responses in tests via a reusable Playwright route mock.
64
+
- Coverage for accessibility, upload, navigation, folder management, deletion flows, and playback across all document types.
### 🗣️ Local Kokoro-FastAPI Quick-start (CPU or GPU)
80
115
81
-
You can run the Kokoro TTS API server directly with Docker. **We are not responsible for issues with Kokoro-FastAPI.** For best performance, use an NVIDIA GPU (for GPU version) or Apple Silicon (for CPU version).
116
+
You can run the Kokoro TTS API server directly with Docker. **We are not responsible for issues with [Kokoro-FastAPI](https://github.com/remsky/Kokoro-FastAPI).** For best performance, use an NVIDIA GPU (for GPU version) or Apple Silicon (for CPU version).
82
117
83
118
> **Note:** When using these, set the `API_BASE` env var to `http://host.docker.internal:8880/v1` or `http://kokoro-tts:8880/v1`.
84
119
> You can also use the example `docker-compose.yml` in `examples/docker-compose.yml` if you prefer Docker Compose.
85
120
86
-
**CPU Version:**
121
+
<details>
122
+
<summary>
123
+
124
+
**Docker CPU**
125
+
126
+
</summary>
127
+
87
128
```bash
88
129
docker run -d \
89
130
--name kokoro-tts \
@@ -99,7 +140,15 @@ docker run -d \
99
140
ghcr.io/remsky/kokoro-fastapi-cpu:v0.2.4
100
141
```
101
142
102
-
**GPU Version:**
143
+
</details>
144
+
145
+
<details>
146
+
<summary>
147
+
148
+
**Docker GPU**
149
+
150
+
</summary>
151
+
103
152
```bash
104
153
docker run -d \
105
154
--name kokoro-tts \
@@ -113,23 +162,31 @@ docker run -d \
113
162
ghcr.io/remsky/kokoro-fastapi-gpu:v0.2.4
114
163
```
115
164
165
+
</details>
166
+
116
167
> **Note:**
117
168
> - These commands are for running the Kokoro TTS API server only. For issues or support, see the [Kokoro-FastAPI repository](https://github.com/remsky/Kokoro-FastAPI).
118
169
> - The GPU version requires NVIDIA Docker support and works best with NVIDIA GPUs. The CPU version works best on Apple Silicon or modern x86 CPUs.
119
170
> - Adjust environment variables as needed for your hardware and use case.
120
171
121
-
## Dev Installation
172
+
## Local Development Installation
122
173
123
174
### Prerequisites
124
-
- Node.js & npm or pnpm (recommended: use [nvm](https://github.com/nvm-sh/nvm) for Node.js)
175
+
- Node.js (recommended: use [nvm](https://github.com/nvm-sh/nvm))
176
+
- pnpm (recommended) or npm
177
+
```bash
178
+
npm install -g pnpm
179
+
```
180
+
- A TTS API server (Kokoro-FastAPI, Orpheus-FastAPI, Deepinfra, OpenAI, etc.) running and accessible
125
181
Optionally required for different features:
126
182
- [FFmpeg](https://ffmpeg.org) (required for audiobook m4b creation only)
127
-
- On Linux: `sudo apt install ffmpeg`
128
-
- On MacOS: `brew install ffmpeg`
183
+
```bash
184
+
brew install ffmpeg
185
+
```
129
186
- [libreoffice](https://www.libreoffice.org) (required for DOCX files)
130
-
- On Linux: `sudo apt install libreoffice`
131
-
- On MacOS: `brew install libreoffice`
132
-
187
+
```bash
188
+
brew install libreoffice
189
+
```
133
190
### Steps
134
191
135
192
1. Clone the repository:
@@ -142,12 +199,7 @@ Optionally required for different features:
142
199
143
200
With pnpm (recommended):
144
201
```bash
145
-
pnpm install
146
-
```
147
-
148
-
Or with npm:
149
-
```bash
150
-
npm install
202
+
pnpm i # or npm i
151
203
```
152
204
153
205
3. Configure the environment:
@@ -161,26 +213,15 @@ Optionally required for different features:
161
213
162
214
With pnpm (recommended):
163
215
```bash
164
-
pnpm dev
165
-
```
166
-
167
-
Or with npm:
168
-
```bash
169
-
npm run dev
216
+
pnpm dev # or npm run dev
170
217
```
171
218
172
219
or build and run the production server:
173
220
174
221
With pnpm:
175
222
```bash
176
-
pnpm build
177
-
pnpm start
178
-
```
179
-
180
-
Or with npm:
181
-
```bash
182
-
npm run build
183
-
npm start
223
+
pnpm build # or npm run build
224
+
pnpm start # or npm start
184
225
```
185
226
186
227
Visit [http://localhost:3003](http://localhost:3003) to run the app.
@@ -217,7 +258,7 @@ This project would not be possible without standing on the shoulders of these gi
217
258
218
259
- **Framework:** Next.js (React)
219
260
- **Containerization:** Docker
220
-
-**Storage:** IndexedDB (inbrowser db store)
261
+
- **Storage:**Dexie + IndexedDB (in-browser local database)
0 commit comments