Skip to content

Commit d7a94a6

Browse files
authored
Merge pull request #62 from richardr1126/version1.0.0
Merge v1.0.0 to main branch
2 parents e7ce1a3 + fb4ede1 commit d7a94a6

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

83 files changed

+7157
-3021
lines changed

.github/workflows/playwright.yml

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
name: Playwright Tests
22
on:
33
push:
4-
branches: [ main, master ]
4+
branches: [ main, master, version1.0.0 ]
55
pull_request:
66
branches: [ main, master ]
77
jobs:
@@ -16,12 +16,14 @@ jobs:
1616
- uses: pnpm/action-setup@v4
1717
with:
1818
version: 9
19-
- name: Install Deps (FFmpeg is install through Playwright)
19+
- name: Install system dependencies
2020
run: |
2121
sudo apt-get update
22-
sudo apt-get install -y libreoffice-writer
22+
sudo apt-get install -y libreoffice-writer ffmpeg
2323
- name: Install dependencies
2424
run: pnpm install --frozen-lockfile
25+
- name: Verify ffprobe
26+
run: ffprobe -version
2527
- name: Install Playwright Browsers
2628
run: pnpm exec playwright install --with-deps
2729
- name: Run Playwright tests

Dockerfile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ COPY . .
2121

2222
# Build the Next.js application
2323
RUN pnpm exec next telemetry disable
24-
RUN pnpm run build
24+
RUN pnpm build
2525

2626
# Expose the port the app runs on
2727
EXPOSE 3003

README.md

Lines changed: 127 additions & 64 deletions
Original file line numberDiff line numberDiff line change
@@ -7,38 +7,82 @@
77

88
[![Discussions](https://img.shields.io/badge/Discussions-Ask%20a%20Question-blue)](../../discussions)
99

10-
# OpenReader WebUI 📄🔊
10+
# 📄🔊 OpenReader WebUI
1111

12-
OpenReader WebUI is a document reader with Text-to-Speech capabilities, offering a TTS read along experience with narration for EPUB, PDF, TXT, MD, and DOCX documents. It supports multiple TTS providers including OpenAI, Deepinfra, and custom OpenAI-compatible endpoints like [Kokoro-FastAPI](https://github.com/remsky/Kokoro-FastAPI) and [Orpheus-FastAPI](https://github.com/Lex-au/Orpheus-FastAPI)
12+
OpenReader WebUI is an open source text to speech document reader web app built using Next.js, offering a TTS read along experience with narration for EPUB, PDF, TXT, MD, and DOCX documents. It supports multiple TTS providers including OpenAI, Deepinfra, and custom OpenAI-compatible endpoints like [Kokoro-FastAPI](https://github.com/remsky/Kokoro-FastAPI) and [Orpheus-FastAPI](https://github.com/Lex-au/Orpheus-FastAPI)
1313

14-
- 🎯 **Multi-Provider TTS Support**:
15-
- **OpenAI**: tts-1, tts-1-hd, gpt-4o-mini-tts models with voices (alloy, echo, fable, onyx, nova, shimmer)
14+
- 🧠 **(New) Smart Sentence-Aware Narration**: EPUB and PDF playback use shared NLP (compromise) and smart sentence continuation to merge sentences that span pages/chapters for smoother TTS trying to prevent hard cuts at page breaks
15+
- 🎧 **(New) Reliable Audiobook Export**: Create and export audiobooks from PDF and EPUB files **(in m4b or mp3 format using ffmpeg)** with resumable, chapter/page-based export and per-chapter regeneration
16+
- 🎯 **(New) Multi-Provider TTS Support**:
1617
- **Deepinfra**: Kokoro-82M, Orpheus-3B, Sesame-1B models with extensive voice libraries
17-
- **Custom OpenAI-Compatible**: Any OpenAI-compatible endpoint with custom voice sets
18-
- 💾 **Local-First Architecture**: Uses IndexedDB browser storage for documents
19-
- 🛜 **Optional Server-side documents**: Manually upload documents to the next backend for all users to download
20-
- 📖 **Read Along Experience**: Follow along with highlighted text as the TTS narrates
21-
- 📄 **Document formats**: EPUB, PDF, TXT, MD, DOCX (with libreoffice installed)
22-
- 🎧 **Audiobook Creation**: Create and export audiobooks from PDF and ePub files **(in m4b format with ffmpeg and aac TTS output)**
18+
- **OpenAI API ($$)**: tts-1, tts-1-hd, gpt-4o-mini-tts models
19+
- **Kokoro-FastAPI**: Self-hosted OpenAI-compatible TTS API server supporting Kokoro-82M and multi-voice combinations (like `af_heart+bf_emma`)
20+
- **Orpheus-FastAPI**: Self-hosted OpenAI-compatible TTS API server supporting Orpheus-3B
21+
- And other Custom OpenAI-compatible endpoints with a `/v1/audio/voices` endpoint
22+
- 🚀 **(New) Optimized TTS Pipeline**: Next.js TTS backend with in-memory LRU audio cache, ETag-aware responses, and in-flight request de-duplication for faster repeat playback
23+
- 💾 **Local-First Architecture**: IndexedDB browser storage for documents and settings (now using Dexie.js)
24+
- 🛜 **Optional Server-side documents**: Manually upload documents to the Next.js backend (and Docker `docstore`) for all users to download
25+
- 📖 **Read Along Experience**: Follow along with real-time highlighted text as the TTS narrates PDF files, using an overlay-based highlighter, per-sentence navigation, and skip controls
26+
- 📄 **Document formats**: EPUB, PDF, TXT, MD, DOCX (with libreoffice installed, plus hardened DOCX→PDF conversion for better reliability)
2327
- 🎨 **Customizable Experience**:
2428
- 🔑 Select TTS provider (OpenAI, Deepinfra, or Custom OpenAI-compatible)
2529
- 🔐 Set TTS API base URL and optional API key
2630
- 🎨 Multiple app theme options
2731
- And more...
2832

29-
### 🛠️ Work in progress
30-
- [ ] **Native .docx support** (currently requires libreoffice)
31-
- [ ] **Accessibility Improvements**
33+
<details>
34+
<summary>
35+
36+
### 🆕 What's New in v1.0.0
37+
38+
</summary>
39+
40+
- 🧠 **Smart sentence continuation**
41+
- Improved NLP handling of complex structures and quoted dialogue provides more natural sentence boundaries and a smoother audio-text flow.
42+
- EPUB and PDF playback now use smarter sentence splitting and continuation metadata so sentences that cross page/chapter boundaries are merged before hitting the TTS API.
43+
- This yields more natural narration and fewer awkward pauses when a sentence spans multiple pages or EPUB spine items.
44+
- 📄 **Modernized PDF text highlighting pipeline**
45+
- Real-time PDF text highlighting is now offloaded to a dedicated Web Worker so scrolling and playback controls remain responsive during narration.
46+
- A new overlay-based highlighting system draws independent highlight layers on top of the PDF, avoiding interference with the underlying text layer.
47+
- Upgraded fuzzy matching with Dice-based similarity improves the accuracy of mapping spoken words to on-screen text.
48+
- A new per-device setting lets you enable or disable real-time PDF highlighting during playback for a more tailored reading experience.
49+
- 🎧 **Chapter/page-based audiobook export with resume & regeneration**
50+
- Per-chapter/per-page generation to disk with persistent `bookId`
51+
- Resumable generation (can cancel and continue later)
52+
- Per-chapter regeneration & deletion
53+
- Final combined **M4B** or **MP3** download with embedded chapter metadata.
54+
- 💾 **Dexie-backed local storage & sync**
55+
- All document types (PDF, EPUB, TXT/MD-as-HTML) and config are stored via a unified Dexie layer on top of IndexedDB.
56+
- Document lists use live Dexie queries (no manual refresh needed), and server sync now correctly includes text/markdown documents as part of the library backup.
57+
- 🗣️ **Kokoro multi-voice selection & utilities**
58+
- Kokoro models now support multi-voice combination, with provider-aware limits and helpers (not supported on OpenAI or Deepinfra)
59+
-**Faster, more efficient TTS backend proxy**
60+
- In-memory **LRU caching** for audio responses with configurable size/TTL
61+
- **ETag** support (`304` on cache hits) + `X-Cache` headers (`HIT` / `MISS` / `INFLIGHT`)
62+
- 📄 **More robust DOCX → PDF conversion**
63+
- DOCX conversion now uses isolated per-job LibreOffice profiles and temp directories, polls for a stable output file size, and aggressively cleans up temp files.
64+
- This reduces cross-job interference and flakiness when converting multiple DOCX files in parallel.
65+
-**Accessibility & layout improvements**
66+
- Dialogs and folder toggles expose proper roles and ARIA attributes.
67+
- PDF/EPUB/HTML readers use a full-height app shell with a sticky bottom TTS bar, improved scrollbars, and refined focus styles.
68+
-**End-to-end Playwright test suite with TTS mocks**
69+
- Deterministic TTS responses in tests via a reusable Playwright route mock.
70+
- Coverage for accessibility, upload, navigation, folder management, deletion flows, audiobook generation/export and playback across all document types.
71+
72+
</details>
3273

3374
## 🐳 Docker Quick Start
3475

3576
### Prerequisites
3677
- Recent version of Docker installed on your machine
3778
- A TTS API server (Kokoro-FastAPI, Orpheus-FastAPI, Deepinfra, OpenAI, etc.) running and accessible
3879

80+
> **Note:** If you have good hardware, you can run [Kokoro-FastAPI with Docker locally](#🗣️-local-kokoro-fastapi-quick-start-cpu-or-gpu) (see below).
81+
3982
### 1. 🐳 Start the Docker container:
4083
```bash
4184
docker run --name openreader-webui \
85+
--restart unless-stopped \
4286
-p 3003:3003 \
4387
-v openreader_docstore:/app/docstore \
4488
ghcr.io/richardr1126/openreader-webui:latest
@@ -47,6 +91,7 @@ OpenReader WebUI is a document reader with Text-to-Speech capabilities, offering
4791
(Optionally): Set the TTS `API_BASE` URL and/or `API_KEY` to be default for all devices
4892
```bash
4993
docker run --name openreader-webui \
94+
--restart unless-stopped \
5095
-e API_KEY=none \
5196
-e API_BASE=http://host.docker.internal:8880/v1 \
5297
-p 3003:3003 \
@@ -72,48 +117,82 @@ docker rm openreader-webui && \
72117
docker pull ghcr.io/richardr1126/openreader-webui:latest
73118
```
74119

75-
### (Alternate) 🐳 Configuration with Docker Compose and Kokoro-FastAPI
120+
### 🗣️ Local Kokoro-FastAPI Quick-start (CPU or GPU)
76121

77-
A complete example docker-compose file with Kokoro-FastAPI and OpenReader WebUI is available in [`docs/examples/docker-compose.yml`](docs/examples/docker-compose.yml). You can download and use it:
122+
You can run the Kokoro TTS API server directly with Docker. **We are not responsible for issues with [Kokoro-FastAPI](https://github.com/remsky/Kokoro-FastAPI).** For best performance, use an NVIDIA GPU (for GPU version) or Apple Silicon (for CPU version).
78123

79-
```bash
80-
# Download example docker-compose.yml
81-
curl --create-dirs -L -o openreader-compose/docker-compose.yml https://raw.githubusercontent.com/richardr1126/OpenReader-WebUI/main/docs/examples/docker-compose.yml
124+
> **Note:** When using these, set the `API_BASE` env var to `http://host.docker.internal:8880/v1` or `http://kokoro-tts:8880/v1`.
125+
> You can also use the example `docker-compose.yml` in `examples/docker-compose.yml` if you prefer Docker Compose.
126+
127+
<details>
128+
<summary>
82129

83-
cd openreader-compose
84-
docker compose up -d
130+
**Docker CPU**
131+
132+
</summary>
133+
134+
```bash
135+
docker run -d \
136+
--name kokoro-tts \
137+
--restart unless-stopped \
138+
-p 8880:8880 \
139+
-e ONNX_NUM_THREADS=8 \
140+
-e ONNX_INTER_OP_THREADS=4 \
141+
-e ONNX_EXECUTION_MODE=parallel \
142+
-e ONNX_OPTIMIZATION_LEVEL=all \
143+
-e ONNX_MEMORY_PATTERN=true \
144+
-e ONNX_ARENA_EXTEND_STRATEGY=kNextPowerOfTwo \
145+
-e API_LOG_LEVEL=DEBUG \
146+
ghcr.io/remsky/kokoro-fastapi-cpu:v0.2.4
85147
```
86148

87-
Or add OpenReader WebUI to your existing `docker-compose.yml`:
88-
```yaml
89-
services:
90-
openreader-webui:
91-
container_name: openreader-webui
92-
image: ghcr.io/richardr1126/openreader-webui:latest
93-
environment:
94-
- API_BASE=http://host.docker.internal:8880/v1
95-
ports:
96-
- "3003:3003"
97-
volumes:
98-
- docstore:/app/docstore
99-
restart: unless-stopped
100-
101-
volumes:
102-
docstore:
149+
</details>
150+
151+
<details>
152+
<summary>
153+
154+
**Docker GPU**
155+
156+
</summary>
157+
158+
```bash
159+
docker run -d \
160+
--name kokoro-tts \
161+
--gpus all \
162+
--user 1001:1001 \
163+
--restart unless-stopped \
164+
-p 8880:8880 \
165+
-e USE_GPU=true \
166+
-e PYTHONUNBUFFERED=1 \
167+
-e API_LOG_LEVEL=DEBUG \
168+
ghcr.io/remsky/kokoro-fastapi-gpu:v0.2.4
103169
```
104170

105-
## Dev Installation
171+
</details>
172+
173+
> **Note:**
174+
> - These commands are for running the Kokoro TTS API server only. For issues or support, see the [Kokoro-FastAPI repository](https://github.com/remsky/Kokoro-FastAPI).
175+
> - The GPU version requires NVIDIA Docker support and works best with NVIDIA GPUs. The CPU version works best on Apple Silicon or modern x86 CPUs.
176+
> - Adjust environment variables as needed for your hardware and use case.
177+
178+
## Local Development Installation
106179

107180
### Prerequisites
108-
- Node.js & npm or pnpm (recommended: use [nvm](https://github.com/nvm-sh/nvm) for Node.js)
181+
- Node.js (recommended: use [nvm](https://github.com/nvm-sh/nvm))
182+
- pnpm (recommended) or npm
183+
```bash
184+
npm install -g pnpm
185+
```
186+
- A TTS API server (Kokoro-FastAPI, Orpheus-FastAPI, Deepinfra, OpenAI, etc.) running and accessible
109187
Optionally required for different features:
110188
- [FFmpeg](https://ffmpeg.org) (required for audiobook m4b creation only)
111-
- On Linux: `sudo apt install ffmpeg`
112-
- On MacOS: `brew install ffmpeg`
189+
```bash
190+
brew install ffmpeg
191+
```
113192
- [libreoffice](https://www.libreoffice.org) (required for DOCX files)
114-
- On Linux: `sudo apt install libreoffice`
115-
- On MacOS: `brew install libreoffice`
116-
193+
```bash
194+
brew install libreoffice
195+
```
117196
### Steps
118197

119198
1. Clone the repository:
@@ -126,12 +205,7 @@ Optionally required for different features:
126205

127206
With pnpm (recommended):
128207
```bash
129-
pnpm install
130-
```
131-
132-
Or with npm:
133-
```bash
134-
npm install
208+
pnpm i # or npm i
135209
```
136210

137211
3. Configure the environment:
@@ -145,26 +219,15 @@ Optionally required for different features:
145219

146220
With pnpm (recommended):
147221
```bash
148-
pnpm dev
149-
```
150-
151-
Or with npm:
152-
```bash
153-
npm run dev
222+
pnpm dev # or npm run dev
154223
```
155224

156225
or build and run the production server:
157226

158227
With pnpm:
159228
```bash
160-
pnpm build
161-
pnpm start
162-
```
163-
164-
Or with npm:
165-
```bash
166-
npm run build
167-
npm start
229+
pnpm build # or npm run build
230+
pnpm start # or npm start
168231
```
169232

170233
Visit [http://localhost:3003](http://localhost:3003) to run the app.
@@ -201,7 +264,7 @@ This project would not be possible without standing on the shoulders of these gi
201264

202265
- **Framework:** Next.js (React)
203266
- **Containerization:** Docker
204-
- **Storage:** IndexedDB (in browser db store)
267+
- **Storage:** Dexie + IndexedDB (in-browser local database)
205268
- **PDF:**
206269
- [react-pdf](https://github.com/wojtekmaj/react-pdf)
207270
- [pdf.js](https://mozilla.github.io/pdf.js/)

0 commit comments

Comments
 (0)