Skip to content

Commit e3799e4

Browse files
committed
feat(tts): add smart sentence continuation
introduce configurable smart sentence splitting with persisted state, UI toggle, and EPUB/PDF contexts that send continuation metadata. enhance TTS pipeline to merge cross-page sentences, manage carryover, and trigger visual page changes during playback for smoother narration. update README with kokoro quick-start guidance, remove the legacy issues mapping doc, and add deterministic TTS mocks plus sample audio for Playwright tests.
1 parent d7ef0fa commit e3799e4

File tree

14 files changed

+642
-198
lines changed

14 files changed

+642
-198
lines changed

README.md

Lines changed: 39 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -36,9 +36,12 @@ OpenReader WebUI is a document reader with Text-to-Speech capabilities, offering
3636
- Recent version of Docker installed on your machine
3737
- A TTS API server (Kokoro-FastAPI, Orpheus-FastAPI, Deepinfra, OpenAI, etc.) running and accessible
3838

39+
> **Note:** If you have good hardware, you can run [Kokoro-FastAPI with Docker locally](#🗣️-local-kokoro-fastapi-quick-start-cpu-or-gpu) (see below).
40+
3941
### 1. 🐳 Start the Docker container:
4042
```bash
4143
docker run --name openreader-webui \
44+
--restart unless-stopped \
4245
-p 3003:3003 \
4346
-v openreader_docstore:/app/docstore \
4447
ghcr.io/richardr1126/openreader-webui:latest
@@ -47,6 +50,7 @@ OpenReader WebUI is a document reader with Text-to-Speech capabilities, offering
4750
(Optionally): Set the TTS `API_BASE` URL and/or `API_KEY` to be default for all devices
4851
```bash
4952
docker run --name openreader-webui \
53+
--restart unless-stopped \
5054
-e API_KEY=none \
5155
-e API_BASE=http://host.docker.internal:8880/v1 \
5256
-p 3003:3003 \
@@ -72,36 +76,48 @@ docker rm openreader-webui && \
7276
docker pull ghcr.io/richardr1126/openreader-webui:latest
7377
```
7478

75-
### (Alternate) 🐳 Configuration with Docker Compose and Kokoro-FastAPI
79+
### 🗣️ Local Kokoro-FastAPI Quick-start (CPU or GPU)
7680

77-
A complete example docker-compose file with Kokoro-FastAPI and OpenReader WebUI is available in [`docs/examples/docker-compose.yml`](docs/examples/docker-compose.yml). You can download and use it:
81+
You can run the Kokoro TTS API server directly with Docker. **We are not responsible for issues with Kokoro-FastAPI.** For best performance, use an NVIDIA GPU (for GPU version) or Apple Silicon (for CPU version).
7882

79-
```bash
80-
# Download example docker-compose.yml
81-
curl --create-dirs -L -o openreader-compose/docker-compose.yml https://raw.githubusercontent.com/richardr1126/OpenReader-WebUI/main/docs/examples/docker-compose.yml
83+
> **Note:** When using these, set the `API_BASE` env var to `http://host.docker.internal:8880/v1` or `http://kokoro-tts:8880/v1`.
84+
> You can also use the example `docker-compose.yml` in `examples/docker-compose.yml` if you prefer Docker Compose.
8285
83-
cd openreader-compose
84-
docker compose up -d
86+
**CPU Version:**
87+
```bash
88+
docker run -d \
89+
--name kokoro-tts \
90+
--restart unless-stopped \
91+
-p 8880:8880 \
92+
-e ONNX_NUM_THREADS=8 \
93+
-e ONNX_INTER_OP_THREADS=4 \
94+
-e ONNX_EXECUTION_MODE=parallel \
95+
-e ONNX_OPTIMIZATION_LEVEL=all \
96+
-e ONNX_MEMORY_PATTERN=true \
97+
-e ONNX_ARENA_EXTEND_STRATEGY=kNextPowerOfTwo \
98+
-e API_LOG_LEVEL=DEBUG \
99+
ghcr.io/remsky/kokoro-fastapi-cpu:v0.2.4
85100
```
86101

87-
Or add OpenReader WebUI to your existing `docker-compose.yml`:
88-
```yaml
89-
services:
90-
openreader-webui:
91-
container_name: openreader-webui
92-
image: ghcr.io/richardr1126/openreader-webui:latest
93-
environment:
94-
- API_BASE=http://host.docker.internal:8880/v1
95-
ports:
96-
- "3003:3003"
97-
volumes:
98-
- docstore:/app/docstore
99-
restart: unless-stopped
100-
101-
volumes:
102-
docstore:
102+
**GPU Version:**
103+
```bash
104+
docker run -d \
105+
--name kokoro-tts \
106+
--gpus all \
107+
--user 1001:1001 \
108+
--restart unless-stopped \
109+
-p 8880:8880 \
110+
-e USE_GPU=true \
111+
-e PYTHONUNBUFFERED=1 \
112+
-e API_LOG_LEVEL=DEBUG \
113+
ghcr.io/remsky/kokoro-fastapi-gpu:v0.2.4
103114
```
104115

116+
> **Note:**
117+
> - These commands are for running the Kokoro TTS API server only. For issues or support, see the [Kokoro-FastAPI repository](https://github.com/remsky/Kokoro-FastAPI).
118+
> - The GPU version requires NVIDIA Docker support and works best with NVIDIA GPUs. The CPU version works best on Apple Silicon or modern x86 CPUs.
119+
> - Adjust environment variables as needed for your hardware and use case.
120+
105121
## Dev Installation
106122

107123
### Prerequisites

docs/issues-to-components.md

Lines changed: 0 additions & 100 deletions
This file was deleted.

playwright.config.ts

Lines changed: 2 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1,28 +1,17 @@
11
import { defineConfig, devices } from '@playwright/test';
22

3-
/**
4-
* Read environment variables from file.
5-
* https://github.com/motdotla/dotenv
6-
*/
7-
// import dotenv from 'dotenv';
8-
// import path from 'path';
9-
// dotenv.config({ path: path.resolve(__dirname, '.env') });
10-
113
/**
124
* See https://playwright.dev/docs/test-configuration.
135
*/
146
export default defineConfig({
157
testDir: './tests',
168
timeout: 30 * 1000,
179
outputDir: './tests/results',
18-
/* Run tests in files in parallel */
19-
fullyParallel: true,
10+
fullyParallel: false,
2011
/* Fail the build on CI if you accidentally left test.only in the source code. */
2112
forbidOnly: !!process.env.CI,
22-
/* Retry on CI only */
2313
retries: process.env.CI ? 2 : 0,
24-
/* Opt out of parallel tests on CI. */
25-
workers: undefined,
14+
workers: process.env.CI ? '100%' : '75%',
2615
/* Reporter to use. See https://playwright.dev/docs/test-reporters */
2716
reporter: 'html',
2817
/* Shared settings for all the projects below. See https://playwright.dev/docs/api/class-testoptions. */

src/app/api/tts/route.ts

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -217,7 +217,7 @@ export async function POST(req: NextRequest) {
217217
console.log('TTS in-flight JOIN for key:', cacheKey.slice(0, 8));
218218
existing.consumers += 1;
219219

220-
const onAbort = (_evt: Event) => {
220+
const onAbort = () => {
221221
existing.consumers = Math.max(0, existing.consumers - 1);
222222
if (existing.consumers === 0) {
223223
existing.controller.abort();
@@ -262,7 +262,7 @@ export async function POST(req: NextRequest) {
262262

263263
inflightRequests.set(cacheKey, entry);
264264

265-
const onAbort = (_evt: Event) => {
265+
const onAbort = () => {
266266
entry.consumers = Math.max(0, entry.consumers - 1);
267267
if (entry.consumers === 0) {
268268
entry.controller.abort();

src/components/DocumentSettings.tsx

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,7 @@ export function DocumentSettings({ isOpen, setIsOpen, epub, html }: {
2727
viewType,
2828
skipBlank,
2929
epubTheme,
30+
smartSentenceSplitting,
3031
headerMargin,
3132
footerMargin,
3233
leftMargin,
@@ -308,6 +309,24 @@ export function DocumentSettings({ isOpen, setIsOpen, epub, html }: {
308309
Automatically skip pages with no text content
309310
</p>
310311
</div>}
312+
{!html && (
313+
<div className="space-y-1">
314+
<label className="flex items-center space-x-2">
315+
<input
316+
type="checkbox"
317+
checked={smartSentenceSplitting}
318+
onChange={(e) => updateConfigKey('smartSentenceSplitting', e.target.checked)}
319+
className="form-checkbox h-4 w-4 text-accent rounded border-muted"
320+
/>
321+
<span className="text-sm font-medium text-foreground">
322+
Smart sentence splitting
323+
</span>
324+
</label>
325+
<p className="text-sm text-muted pl-6">
326+
Merge sentences across page or section breaks for smoother TTS.
327+
</p>
328+
</div>
329+
)}
311330
{epub && (
312331
<div className="space-y-1">
313332
<label className="flex items-center space-x-2">

src/components/PDFViewer.tsx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -239,4 +239,4 @@ export function PDFViewer({ zoomLevel }: PDFViewerProps) {
239239
</Document>
240240
</div>
241241
);
242-
}
242+
}

src/contexts/ConfigContext.tsx

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,7 @@ type ConfigValues = {
2929
ttsModel: string;
3030
ttsInstructions: string;
3131
savedVoices: SavedVoices;
32+
smartSentenceSplitting: boolean;
3233
};
3334

3435
/** Interface defining the configuration context shape and functionality */
@@ -41,6 +42,7 @@ interface ConfigContextType {
4142
voice: string;
4243
skipBlank: boolean;
4344
epubTheme: boolean;
45+
smartSentenceSplitting: boolean;
4446
headerMargin: number;
4547
footerMargin: number;
4648
leftMargin: number;
@@ -73,6 +75,7 @@ export function ConfigProvider({ children }: { children: ReactNode }) {
7375
const [voice, setVoice] = useState<string>('af_sarah');
7476
const [skipBlank, setSkipBlank] = useState<boolean>(true);
7577
const [epubTheme, setEpubTheme] = useState<boolean>(false);
78+
const [smartSentenceSplitting, setSmartSentenceSplitting] = useState<boolean>(true);
7679
const [headerMargin, setHeaderMargin] = useState<number>(0.07);
7780
const [footerMargin, setFooterMargin] = useState<number>(0.07);
7881
const [leftMargin, setLeftMargin] = useState<number>(0.07);
@@ -103,6 +106,7 @@ export function ConfigProvider({ children }: { children: ReactNode }) {
103106
const cachedAudioPlayerSpeed = await getItem('audioPlayerSpeed');
104107
const cachedSkipBlank = await getItem('skipBlank');
105108
const cachedEpubTheme = await getItem('epubTheme');
109+
const cachedSmartSentenceSplitting = await getItem('smartSentenceSplitting');
106110
const cachedHeaderMargin = await getItem('headerMargin');
107111
const cachedFooterMargin = await getItem('footerMargin');
108112
const cachedLeftMargin = await getItem('leftMargin');
@@ -187,6 +191,7 @@ export function ConfigProvider({ children }: { children: ReactNode }) {
187191
setAudioPlayerSpeed(parseFloat(cachedAudioPlayerSpeed || '1'));
188192
setSkipBlank(cachedSkipBlank === 'false' ? false : true);
189193
setEpubTheme(cachedEpubTheme === 'true');
194+
setSmartSentenceSplitting(cachedSmartSentenceSplitting === 'false' ? false : true);
190195
setHeaderMargin(parseFloat(cachedHeaderMargin || '0.07'));
191196
setFooterMargin(parseFloat(cachedFooterMargin || '0.07'));
192197
setLeftMargin(parseFloat(cachedLeftMargin || '0.07'));
@@ -211,6 +216,9 @@ export function ConfigProvider({ children }: { children: ReactNode }) {
211216
if (cachedEpubTheme === null) {
212217
await setItem('epubTheme', 'false');
213218
}
219+
if (cachedSmartSentenceSplitting === null) {
220+
await setItem('smartSentenceSplitting', 'true');
221+
}
214222
if (cachedHeaderMargin === null) await setItem('headerMargin', '0.07');
215223
if (cachedFooterMargin === null) await setItem('footerMargin', '0.07');
216224
if (cachedLeftMargin === null) await setItem('leftMargin', '0.0');
@@ -353,6 +361,9 @@ export function ConfigProvider({ children }: { children: ReactNode }) {
353361
case 'epubTheme':
354362
setEpubTheme(value as boolean);
355363
break;
364+
case 'smartSentenceSplitting':
365+
setSmartSentenceSplitting(value as boolean);
366+
break;
356367
case 'headerMargin':
357368
setHeaderMargin(value as number);
358369
break;
@@ -388,6 +399,7 @@ export function ConfigProvider({ children }: { children: ReactNode }) {
388399
voice,
389400
skipBlank,
390401
epubTheme,
402+
smartSentenceSplitting,
391403
headerMargin,
392404
footerMargin,
393405
leftMargin,
@@ -417,4 +429,4 @@ export function useConfig() {
417429
throw new Error('useConfig must be used within a ConfigProvider');
418430
}
419431
return context;
420-
}
432+
}

0 commit comments

Comments
 (0)