Skip to content

Commit 5316596

Browse files
committed
refactor(db): migrate to Dexie with reactive queries and simplify data layer
Replaces custom IndexedDB implementation with Dexie ORM, eliminating 850+ lines of boilerplate code and introducing reactive live queries across all document types. Transforms document management from imperative refresh patterns to automatic reactive updates using dexie-react-hooks. Simplifies TTS backend by removing concurrency semaphore while maintaining request de-duplication through in-flight tracking. Streamlines document hooks by removing manual state management and refresh methods. Updates package dependencies and type definitions to support new database architecture while maintaining full backward compatibility for existing documents and settings. BREAKING CHANGE: Document hooks no longer expose refresh() methods as updates are now reactive through live queries.
1 parent e3799e4 commit 5316596

File tree

19 files changed

+607
-1164
lines changed

19 files changed

+607
-1164
lines changed

README.md

Lines changed: 85 additions & 44 deletions
Original file line numberDiff line numberDiff line change
@@ -7,28 +7,63 @@
77

88
[![Discussions](https://img.shields.io/badge/Discussions-Ask%20a%20Question-blue)](../../discussions)
99

10-
# OpenReader WebUI 📄🔊
10+
# 📄🔊 OpenReader WebUI
1111

12-
OpenReader WebUI is a document reader with Text-to-Speech capabilities, offering a TTS read along experience with narration for EPUB, PDF, TXT, MD, and DOCX documents. It supports multiple TTS providers including OpenAI, Deepinfra, and custom OpenAI-compatible endpoints like [Kokoro-FastAPI](https://github.com/remsky/Kokoro-FastAPI) and [Orpheus-FastAPI](https://github.com/Lex-au/Orpheus-FastAPI)
12+
OpenReader WebUI is an open source text to speech document reader web app built using Next.js, offering a TTS read along experience with narration for EPUB, PDF, TXT, MD, and DOCX documents. It supports multiple TTS providers including OpenAI, Deepinfra, and custom OpenAI-compatible endpoints like [Kokoro-FastAPI](https://github.com/remsky/Kokoro-FastAPI) and [Orpheus-FastAPI](https://github.com/Lex-au/Orpheus-FastAPI)
1313

14-
- 🎯 **Multi-Provider TTS Support**:
15-
- **OpenAI**: tts-1, tts-1-hd, gpt-4o-mini-tts models with voices (alloy, echo, fable, onyx, nova, shimmer)
14+
- 🧠 **(New) Smart Sentence-Aware Narration**: EPUB and PDF playback use shared NLP (compromise) and smart sentence continuation to merge sentences that span pages/chapters for smoother TTS trying to prevent hard cuts at page breaks
15+
- 🎧 **(New) Reliable Audiobook Export**: Create and export audiobooks from PDF and EPUB files **(in m4b or mp3 format using ffmpeg)** with resumable, chapter/page-based export and per-chapter regeneration
16+
- 🎯 **(New) Multi-Provider TTS Support**:
1617
- **Deepinfra**: Kokoro-82M, Orpheus-3B, Sesame-1B models with extensive voice libraries
17-
- **Custom OpenAI-Compatible**: Any OpenAI-compatible endpoint with custom voice sets
18-
- 💾 **Local-First Architecture**: Uses IndexedDB browser storage for documents
19-
- 🛜 **Optional Server-side documents**: Manually upload documents to the next backend for all users to download
20-
- 📖 **Read Along Experience**: Follow along with highlighted text as the TTS narrates
21-
- 📄 **Document formats**: EPUB, PDF, TXT, MD, DOCX (with libreoffice installed)
22-
- 🎧 **Audiobook Creation**: Create and export audiobooks from PDF and ePub files **(in m4b format with ffmpeg and aac TTS output)**
18+
- **OpenAI API ($$)**: tts-1, tts-1-hd, gpt-4o-mini-tts models
19+
- **Kokoro-FastAPI**: Self-hosted OpenAI-compatible TTS API server supporting Kokoro-82M and multi-voice combinations (like `af_heart+bf_emma`)
20+
- **Orpheus-FastAPI**: Self-hosted OpenAI-compatible TTS API server supporting Orpheus-3B
21+
- And other Custom OpenAI-compatible endpoints with a `/v1/audio/voices` endpoint
22+
- 🚀 **(New) Optimized TTS Pipeline**: Next.js TTS backend with in-memory LRU audio cache, ETag-aware responses, and in-flight request de-duplication for faster repeat playback
23+
- 💾 **Local-First Architecture**: IndexedDB browser storage for documents and settings (now using Dexie.js)
24+
- 🛜 **Optional Server-side documents**: Manually upload documents to the Next.js backend (and Docker `docstore`) for all users to download
25+
- 📖 **Read Along Experience**: Follow along with highlighted text as the TTS narrates PDF files, with per-sentence navigation and skip controls
26+
- 📄 **Document formats**: EPUB, PDF, TXT, MD, DOCX (with libreoffice installed, plus hardened DOCX→PDF conversion for better reliability)
2327
- 🎨 **Customizable Experience**:
2428
- 🔑 Select TTS provider (OpenAI, Deepinfra, or Custom OpenAI-compatible)
2529
- 🔐 Set TTS API base URL and optional API key
2630
- 🎨 Multiple app theme options
2731
- And more...
2832

29-
### 🛠️ Work in progress
30-
- [ ] **Native .docx support** (currently requires libreoffice)
31-
- [ ] **Accessibility Improvements**
33+
<details>
34+
<summary>
35+
36+
### 🆕 What's New in v1.0.0
37+
38+
</summary>
39+
40+
- 🧠 **Smart sentence continuation**
41+
- EPUB and PDF playback now use smarter sentence splitting and continuation metadata so sentences that cross page/chapter boundaries are merged before hitting the TTS API.
42+
- This yields more natural narration and fewer awkward pauses when a sentence spans multiple pages or EPUB spine items
43+
- 🎧 **Chapter/page-based audiobook export with resume & regeneration**
44+
- Per-chapter/per-page generation to disk with persistent `bookId`
45+
- Resumable generation (can cancel and continue later)
46+
- Per-chapter regeneration & deletion
47+
- Final combined **M4B** or **MP3** download with embedded chapter metadata.
48+
- 💾 **Dexie-backed local storage & sync**
49+
- All document types (PDF, EPUB, TXT/MD-as-HTML) and config are stored via a unified Dexie layer on top of IndexedDB.
50+
- Document lists use live Dexie queries (no manual refresh needed), and server sync now correctly includes text/markdown documents as part of the library backup.
51+
- 🗣️ **Kokoro multi-voice selection & utilities**
52+
- Kokoro models now support multi-voice combination, with provider-aware limits and helpers (not supported on OpenAI or Deepinfra)
53+
-**Faster, more efficient TTS backend proxy**
54+
- In-memory **LRU caching** for audio responses with configurable size/TTL
55+
- **ETag** support (`304` on cache hits) + `X-Cache` headers (`HIT` / `MISS` / `INFLIGHT`)
56+
- 📄 **More robust DOCX → PDF conversion**
57+
- DOCX conversion now uses isolated per-job LibreOffice profiles and temp directories, polls for a stable output file size, and aggressively cleans up temp files.
58+
- This reduces cross-job interference and flakiness when converting multiple DOCX files in parallel.
59+
-**Accessibility & layout improvements**
60+
- Dialogs and folder toggles expose proper roles and ARIA attributes.
61+
- PDF/EPUB/HTML readers use a full-height app shell with a sticky bottom TTS bar, improved scrollbars, and refined focus styles.
62+
-**End-to-end Playwright test suite with TTS mocks**
63+
- Deterministic TTS responses in tests via a reusable Playwright route mock.
64+
- Coverage for accessibility, upload, navigation, folder management, deletion flows, and playback across all document types.
65+
66+
</details>
3267

3368
## 🐳 Docker Quick Start
3469

@@ -78,12 +113,18 @@ docker pull ghcr.io/richardr1126/openreader-webui:latest
78113

79114
### 🗣️ Local Kokoro-FastAPI Quick-start (CPU or GPU)
80115

81-
You can run the Kokoro TTS API server directly with Docker. **We are not responsible for issues with Kokoro-FastAPI.** For best performance, use an NVIDIA GPU (for GPU version) or Apple Silicon (for CPU version).
116+
You can run the Kokoro TTS API server directly with Docker. **We are not responsible for issues with [Kokoro-FastAPI](https://github.com/remsky/Kokoro-FastAPI).** For best performance, use an NVIDIA GPU (for GPU version) or Apple Silicon (for CPU version).
82117

83118
> **Note:** When using these, set the `API_BASE` env var to `http://host.docker.internal:8880/v1` or `http://kokoro-tts:8880/v1`.
84119
> You can also use the example `docker-compose.yml` in `examples/docker-compose.yml` if you prefer Docker Compose.
85120
86-
**CPU Version:**
121+
<details>
122+
<summary>
123+
124+
**Docker CPU**
125+
126+
</summary>
127+
87128
```bash
88129
docker run -d \
89130
--name kokoro-tts \
@@ -99,7 +140,15 @@ docker run -d \
99140
ghcr.io/remsky/kokoro-fastapi-cpu:v0.2.4
100141
```
101142

102-
**GPU Version:**
143+
</details>
144+
145+
<details>
146+
<summary>
147+
148+
**Docker GPU**
149+
150+
</summary>
151+
103152
```bash
104153
docker run -d \
105154
--name kokoro-tts \
@@ -113,23 +162,31 @@ docker run -d \
113162
ghcr.io/remsky/kokoro-fastapi-gpu:v0.2.4
114163
```
115164

165+
</details>
166+
116167
> **Note:**
117168
> - These commands are for running the Kokoro TTS API server only. For issues or support, see the [Kokoro-FastAPI repository](https://github.com/remsky/Kokoro-FastAPI).
118169
> - The GPU version requires NVIDIA Docker support and works best with NVIDIA GPUs. The CPU version works best on Apple Silicon or modern x86 CPUs.
119170
> - Adjust environment variables as needed for your hardware and use case.
120171
121-
## Dev Installation
172+
## Local Development Installation
122173

123174
### Prerequisites
124-
- Node.js & npm or pnpm (recommended: use [nvm](https://github.com/nvm-sh/nvm) for Node.js)
175+
- Node.js (recommended: use [nvm](https://github.com/nvm-sh/nvm))
176+
- pnpm (recommended) or npm
177+
```bash
178+
npm install -g pnpm
179+
```
180+
- A TTS API server (Kokoro-FastAPI, Orpheus-FastAPI, Deepinfra, OpenAI, etc.) running and accessible
125181
Optionally required for different features:
126182
- [FFmpeg](https://ffmpeg.org) (required for audiobook m4b creation only)
127-
- On Linux: `sudo apt install ffmpeg`
128-
- On MacOS: `brew install ffmpeg`
183+
```bash
184+
brew install ffmpeg
185+
```
129186
- [libreoffice](https://www.libreoffice.org) (required for DOCX files)
130-
- On Linux: `sudo apt install libreoffice`
131-
- On MacOS: `brew install libreoffice`
132-
187+
```bash
188+
brew install libreoffice
189+
```
133190
### Steps
134191

135192
1. Clone the repository:
@@ -142,12 +199,7 @@ Optionally required for different features:
142199

143200
With pnpm (recommended):
144201
```bash
145-
pnpm install
146-
```
147-
148-
Or with npm:
149-
```bash
150-
npm install
202+
pnpm i # or npm i
151203
```
152204

153205
3. Configure the environment:
@@ -161,26 +213,15 @@ Optionally required for different features:
161213

162214
With pnpm (recommended):
163215
```bash
164-
pnpm dev
165-
```
166-
167-
Or with npm:
168-
```bash
169-
npm run dev
216+
pnpm dev # or npm run dev
170217
```
171218

172219
or build and run the production server:
173220

174221
With pnpm:
175222
```bash
176-
pnpm build
177-
pnpm start
178-
```
179-
180-
Or with npm:
181-
```bash
182-
npm run build
183-
npm start
223+
pnpm build # or npm run build
224+
pnpm start # or npm start
184225
```
185226

186227
Visit [http://localhost:3003](http://localhost:3003) to run the app.
@@ -217,7 +258,7 @@ This project would not be possible without standing on the shoulders of these gi
217258

218259
- **Framework:** Next.js (React)
219260
- **Containerization:** Docker
220-
- **Storage:** IndexedDB (in browser db store)
261+
- **Storage:** Dexie + IndexedDB (in-browser local database)
221262
- **PDF:**
222263
- [react-pdf](https://github.com/wojtekmaj/react-pdf)
223264
- [pdf.js](https://mozilla.github.io/pdf.js/)

package.json

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,8 @@
1717
"cmpstr": "^3.0.4",
1818
"compromise": "^14.14.4",
1919
"core-js": "^3.46.0",
20+
"dexie": "^4.2.1",
21+
"dexie-react-hooks": "^4.2.0",
2022
"epubjs": "^0.3.93",
2123
"howler": "^2.2.4",
2224
"lru-cache": "^11.2.2",

pnpm-lock.yaml

Lines changed: 24 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

src/app/api/documents/route.ts

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
import { writeFile, readFile, readdir, mkdir, unlink } from 'fs/promises';
22
import { NextRequest, NextResponse } from 'next/server';
33
import path from 'path';
4+
import type { BaseDocument, SyncedDocument } from '@/types/documents';
45

56
const DOCS_DIR = path.join(process.cwd(), 'docstore');
67

@@ -17,9 +18,10 @@ export async function POST(req: NextRequest) {
1718
try {
1819
await ensureDocsDir();
1920
const data = await req.json();
21+
const documents = data.documents as SyncedDocument[];
2022

2123
// Save document metadata and content
22-
for (const doc of data.documents) {
24+
for (const doc of documents) {
2325
const docPath = path.join(DOCS_DIR, `${doc.id}.json`);
2426
const contentPath = path.join(DOCS_DIR, `${doc.id}.${doc.type}`);
2527

@@ -49,7 +51,7 @@ export async function POST(req: NextRequest) {
4951
export async function GET() {
5052
try {
5153
await ensureDocsDir();
52-
const documents = [];
54+
const documents: SyncedDocument[] = [];
5355

5456
const files = await readdir(DOCS_DIR);
5557
const jsonFiles = files.filter(file => file.endsWith('.json'));
@@ -58,7 +60,7 @@ export async function GET() {
5860
const docPath = path.join(DOCS_DIR, file);
5961

6062
try {
61-
const metadata = JSON.parse(await readFile(docPath, 'utf8'));
63+
const metadata = JSON.parse(await readFile(docPath, 'utf8')) as BaseDocument;
6264
const contentPath = path.join(DOCS_DIR, `${metadata.id}.${metadata.type}`);
6365
const content = await readFile(contentPath);
6466

src/app/api/tts/route.ts

Lines changed: 1 addition & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -23,36 +23,6 @@ const ttsAudioCache = new LRUCache<string, AudioBufferValue>({
2323
ttl: TTS_CACHE_TTL_MS,
2424
});
2525

26-
// Concurrency controls and in-flight de-duplication
27-
const TTS_MAX_CONCURRENCY = Number(process.env.TTS_MAX_CONCURRENCY || 4);
28-
29-
class Semaphore {
30-
private permits: number;
31-
private queue: Array<() => void> = [];
32-
constructor(max: number) {
33-
this.permits = Math.max(1, max);
34-
}
35-
async acquire(): Promise<() => void> {
36-
if (this.permits > 0) {
37-
this.permits -= 1;
38-
return this.release.bind(this);
39-
}
40-
return new Promise<() => void>((resolve) => {
41-
this.queue.push(() => {
42-
this.permits -= 1;
43-
resolve(this.release.bind(this));
44-
});
45-
});
46-
}
47-
private release() {
48-
this.permits += 1;
49-
const next = this.queue.shift();
50-
if (next) next();
51-
}
52-
}
53-
54-
const ttsSemaphore = new Semaphore(TTS_MAX_CONCURRENCY);
55-
5626
type InflightEntry = {
5727
promise: Promise<ArrayBuffer>;
5828
controller: AbortController;
@@ -211,7 +181,7 @@ export async function POST(req: NextRequest) {
211181
});
212182
}
213183

214-
// De-duplicate identical in-flight requests and bound upstream concurrency
184+
// De-duplicate identical in-flight requests
215185
const existing = inflightRequests.get(cacheKey);
216186
if (existing) {
217187
console.log('TTS in-flight JOIN for key:', cacheKey.slice(0, 8));
@@ -247,14 +217,12 @@ export async function POST(req: NextRequest) {
247217
controller,
248218
consumers: 1,
249219
promise: (async () => {
250-
const release = await ttsSemaphore.acquire();
251220
try {
252221
const buffer = await fetchTTSBufferWithRetry(openai, createParams, controller.signal);
253222
// Save to cache
254223
ttsAudioCache.set(cacheKey, buffer);
255224
return buffer;
256225
} finally {
257-
release();
258226
inflightRequests.delete(cacheKey);
259227
}
260228
})()

0 commit comments

Comments
 (0)