Skip to content

Commit b21bdb3

Browse files
committed
feat(tts): add multi-provider TTS support with Deepinfra and custom OpenAI-compatible endpoints
Add comprehensive multi-provider TTS support enabling users to choose between OpenAI, Deepinfra, and custom OpenAI-compatible endpoints. Implement provider-specific voice management with automatic voice restoration per provider-model combination, and migrate package manager to pnpm for improved dependency handling. Key changes: - Add TTS provider selection (OpenAI, Deepinfra, custom-openai) in settings UI - Implement provider-specific model and voice lists with dynamic fetching - Add voice persistence per provider-model combination in savedVoices - Support Deepinfra models: Kokoro-82M, Orpheus-3B, Sesame-1B with their voice libraries - Migrate to pnpm with frozen lockfile for reproducible builds - Update Docker configuration to use pnpm and Deepinfra API defaults - Add migration logic for existing users to infer provider from stored baseUrl - Update test helpers and Playwright configuration for Deepinfra API - Add example docker-compose.yml with Kokoro-FastAPI integration BREAKING CHANGE: Voice selection is now provider-model specific. Previously saved voices will be migrated to the new savedVoices structure, but users may need to reselect voices if switching providers.
1 parent e56736f commit b21bdb3

File tree

19 files changed

+6713
-9777
lines changed

19 files changed

+6713
-9777
lines changed

.github/workflows/playwright.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -24,8 +24,8 @@ jobs:
2424
- name: Run Playwright tests
2525
env:
2626
NEXT_PUBLIC_NODE_ENV: test
27-
API_BASE: https://tts.richardr.dev/v1
28-
API_KEY: not-needed
27+
API_BASE: https://api.deepinfra.com/v1/openai
28+
API_KEY: ${{ secrets.DEEPINFRA_API_KEY }}
2929
run: npx playwright test --reporter=list,github,html
3030
- uses: actions/upload-artifact@v4
3131
if: ${{ !cancelled() }}

.npmrc

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
# pnpm configuration
2+
auto-install-peers=true
3+
strict-peer-dependencies=false

Dockerfile

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -4,23 +4,26 @@ FROM node:current-alpine
44
# Add ffmpeg and libreoffice using Alpine package manager
55
RUN apk add --no-cache ffmpeg libreoffice-writer
66

7+
# Install pnpm globally
8+
RUN npm install -g pnpm
9+
710
# Create app directory
811
WORKDIR /app
912

1013
# Copy package files
11-
COPY package*.json ./
14+
COPY package.json pnpm-lock.yaml ./
1215

1316
# Install dependencies
14-
RUN npm install
17+
RUN pnpm install --frozen-lockfile
1518

1619
# Copy project files
1720
COPY . .
1821

1922
# Build the Next.js application
20-
RUN npm run build
23+
RUN pnpm run build
2124

2225
# Expose the port the app runs on
2326
EXPOSE 3003
2427

2528
# Start the application
26-
CMD ["npm", "start"]
29+
CMD ["pnpm", "start"]

README.md

Lines changed: 77 additions & 44 deletions
Original file line numberDiff line numberDiff line change
@@ -9,75 +9,83 @@
99

1010
# OpenReader WebUI 📄🔊
1111

12-
OpenReader WebUI is a document reader with Text-to-Speech capabilities, offering a TTS read along experience with narration for EPUB, PDF, TXT, MD, and DOCX documents. It can use any OpenAI compatible TTS endpoint, including [Kokoro-FastAPI](https://github.com/remsky/Kokoro-FastAPI) and [Orpheus-FastAPI](https://github.com/Lex-au/Orpheus-FastAPI)
12+
OpenReader WebUI is a document reader with Text-to-Speech capabilities, offering a TTS read along experience with narration for EPUB, PDF, TXT, MD, and DOCX documents. It supports multiple TTS providers including OpenAI, Deepinfra, and custom OpenAI-compatible endpoints like [Kokoro-FastAPI](https://github.com/remsky/Kokoro-FastAPI) and [Orpheus-FastAPI](https://github.com/Lex-au/Orpheus-FastAPI)
1313

14-
- 🎯 **TTS API Integration**:
15-
- Compatible with OpenAI text to speech API and GPT-4o Mini TTS, Kokoro-FastAPI TTS, Orpheus FastAPI or any other compatible service
16-
- Support for TTS models (tts-1, tts-1-hd, gpt-4o-mini-tts, kokoro, and custom)
14+
- 🎯 **Multi-Provider TTS Support**:
15+
- **OpenAI**: tts-1, tts-1-hd, gpt-4o-mini-tts models with voices (alloy, echo, fable, onyx, nova, shimmer)
16+
- **Deepinfra**: Kokoro-82M, Orpheus-3B, Sesame-1B models with extensive voice libraries
17+
- **Custom OpenAI-Compatible**: Any OpenAI-compatible endpoint with custom voice sets
18+
- Provider-specific voice management with automatic voice restoration per provider-model combination
1719
- 💾 **Local-First Architecture**: Uses IndexedDB browser storage for documents
1820
- 🛜 **Optional Server-side documents**: Manually upload documents to the next backend for all users to download
1921
- 📖 **Read Along Experience**: Follow along with highlighted text as the TTS narrates
2022
- 📄 **Document formats**: EPUB, PDF, TXT, MD, DOCX (with libreoffice installed)
2123
- 🎧 **Audiobook Creation**: Create and export audiobooks from PDF and ePub files **(in m4b format with ffmpeg and aac TTS output)**
22-
- 📲 **Mobile Support**: Works on mobile devices, and can be added as a PWA web app
2324
- 🎨 **Customizable Experience**:
24-
- 🔑 Set TTS API base URL (and optional API key)
25-
- 🎯 Set model-specific instructions for GPT-4o Mini TTS
26-
- 🏎️ Adjustable playback speed
27-
- 📐 Customize PDF text extraction margins
28-
- 🗣️ Multiple voice options (checks `/v1/audio/voices` endpoint)
25+
- 🔑 Select TTS provider (OpenAI, Deepinfra, or Custom OpenAI-compatible)
26+
- 🔐 Set TTS API base URL and optional API key
2927
- 🎨 Multiple app theme options
28+
- And more...
3029

3130
### 🛠️ Work in progress
3231
- [ ] **Native .docx support** (currently requires libreoffice)
33-
- [ ] **Support non-OpenAI TTS APIs**: ElevenLabs, etc.
3432
- [ ] **Accessibility Improvements**
3533

3634
## 🐳 Docker Quick Start
3735

3836
### Prerequisites
3937
- Recent version of Docker installed on your machine
40-
- A TTS API server (Kokoro-FastAPI, Orpheus-FastAPI, or OpenAI API)
41-
42-
```bash
43-
docker run --name openreader-webui \
44-
-p 3003:3003 \
45-
-v openreader_docstore:/app/docstore \
46-
ghcr.io/richardr1126/openreader-webui:latest
47-
```
48-
49-
(Optionally): Set the TTS `API_BASE` URL and/or `API_KEY` to be default for all devices
50-
```bash
51-
docker run --name openreader-webui \
52-
-e API_BASE=http://host.docker.internal:8880/v1 \
53-
-p 3003:3003 \
54-
-v openreader_docstore:/app/docstore \
55-
ghcr.io/richardr1126/openreader-webui:latest
56-
```
57-
58-
> Requesting audio from the TTS API happens on the Next.js server not the client. So the base URL for the TTS API should be accessible and relative to the Next.js server. If it is in a Docker you may need to use `host.docker.internal` to access the host machine, instead of `localhost`.
59-
60-
Visit [http://localhost:3003](http://localhost:3003) to run the app and set your settings.
61-
62-
> **Note:** The `openreader_docstore` volume is used to store server-side documents. You can mount a local directory instead. Or remove it if you don't need server-side documents.
63-
64-
### ⬆️ Update Docker Image
38+
- A TTS API server (Kokoro-FastAPI, Orpheus-FastAPI, Deepinfra, OpenAI, etc.) running and accessible
39+
40+
### 1. 🐳 Start the Docker container:
41+
```bash
42+
docker run --name openreader-webui \
43+
-p 3003:3003 \
44+
-v openreader_docstore:/app/docstore \
45+
ghcr.io/richardr1126/openreader-webui:latest
46+
```
47+
48+
(Optionally): Set the TTS `API_BASE` URL and/or `API_KEY` to be default for all devices
49+
```bash
50+
docker run --name openreader-webui \
51+
-e API_KEY=none \
52+
-e API_BASE=http://host.docker.internal:8880/v1 \
53+
-p 3003:3003 \
54+
-v openreader_docstore:/app/docstore \
55+
ghcr.io/richardr1126/openreader-webui:latest
56+
```
57+
58+
> **Note:** Requesting audio from the TTS API happens on the Next.js server not the client. So the base URL for the TTS API should be accessible and relative to the Next.js server. If it is in a Docker you may need to use `host.docker.internal` to access the host machine, instead of `localhost`.
59+
60+
Visit [http://localhost:3003](http://localhost:3003) to run the app and set your settings.
61+
62+
> **Note:** The `openreader_docstore` volume is used to store server-side documents. You can mount a local directory instead. Or remove it if you don't need server-side documents.
63+
64+
### 2. ⚙️ Configure the app settings in the UI:
65+
- Set the TTS Provider and Model in the Settings modal
66+
- Set the TTS API Base URL and API Key if needed (more secure to set in env vars)
67+
- Select your model's voice from the dropdown (voices try to be fetched from TTS Provider API)
68+
69+
### 3. ⬆️ Updating Docker Image
6570
```bash
6671
docker stop openreader-webui && \
6772
docker rm openreader-webui && \
6873
docker pull ghcr.io/richardr1126/openreader-webui:latest
6974
```
7075

71-
### Adding to a Docker Compose (i.e. with open-webui or Kokoro-FastAPI)
76+
### (Alternate) 🐳 Configuration with Docker Compose and Kokoro-FastAPI
7277

73-
> Note: This is an example of how to add OpenReader WebUI to a docker-compose file. You can add it to your existing docker-compose file or create a new one in this directory. Then run `docker-compose up --build` to start the services.
78+
A complete example docker-compose file with Kokoro-FastAPI and OpenReader WebUI is available in [`examples/docker-compose.yml`](examples/docker-compose.yml). You can download and use it:
7479

80+
```bash
81+
mkdir -p openreader-compose
82+
cd openreader-compose
83+
curl -O https://raw.githubusercontent.com/richardr1126/OpenReader-WebUI/main/examples/docker-compose.yml
84+
docker compose up -d
85+
```
7586

76-
Create or add to a `docker-compose.yml`:
87+
Or add OpenReader WebUI to your existing `docker-compose.yml`:
7788
```yaml
78-
volumes:
79-
docstore:
80-
8189
services:
8290
openreader-webui:
8391
container_name: openreader-webui
@@ -89,12 +97,15 @@ services:
8997
volumes:
9098
- docstore:/app/docstore
9199
restart: unless-stopped
100+
101+
volumes:
102+
docstore:
92103
```
93104
94105
## Dev Installation
95106
96107
### Prerequisites
97-
- Node.js & npm (recommended: use [nvm](https://github.com/nvm-sh/nvm))
108+
- Node.js & npm or pnpm (recommended: use [nvm](https://github.com/nvm-sh/nvm) for Node.js)
98109
Optionally required for different features:
99110
- [FFmpeg](https://ffmpeg.org) (required for audiobook m4b creation only)
100111
- On Linux: `sudo apt install ffmpeg`
@@ -112,6 +123,13 @@ Optionally required for different features:
112123
```
113124

114125
2. Install dependencies:
126+
127+
With pnpm (recommended):
128+
```bash
129+
pnpm install
130+
```
131+
132+
Or with npm:
115133
```bash
116134
npm install
117135
```
@@ -124,11 +142,26 @@ Optionally required for different features:
124142
> Note: The base URL for the TTS API should be accessible and relative to the Next.js server
125143

126144
4. Start the development server:
145+
146+
With pnpm (recommended):
147+
```bash
148+
pnpm dev
149+
```
150+
151+
Or with npm:
127152
```bash
128153
npm run dev
129154
```
130155

131156
or build and run the production server:
157+
158+
With pnpm:
159+
```bash
160+
pnpm build
161+
pnpm start
162+
```
163+
164+
Or with npm:
132165
```bash
133166
npm run build
134167
npm start
@@ -183,9 +216,9 @@ This project would not be possible without standing on the shoulders of these gi
183216
- [Headless UI](https://headlessui.com)
184217
- [@tailwindcss/typography](https://tailwindcss.com/docs/typography-plugin)
185218
- **TTS:** (tested on)
219+
- [Deepinfra API](https://deepinfra.com) (Kokoro-82M, Orpheus-3B, Sesame-1B)
186220
- [Kokoro FastAPI TTS](https://github.com/remsky/Kokoro-FastAPI/tree/v0.0.5post1-stable)
187221
- [Orpheus FastAPI TTS](https://github.com/Lex-au/Orpheus-FastAPI)
188-
- [OpenAI API](https://platform.openai.com/docs/api-reference/text-to-speech)
189222
- **NLP:** [compromise](https://github.com/spencermountain/compromise) NLP library for sentence splitting
190223

191224
## License

examples/docker-compose.yml

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
services:
2+
kokoro-tts:
3+
container_name: kokoro-tts
4+
image: ghcr.io/remsky/kokoro-fastapi-cpu:v0.2.4
5+
ports:
6+
- "8880:8880"
7+
environment:
8+
# ONNX Optimization Settings for vectorized operations
9+
- ONNX_NUM_THREADS=8 # Maximize core usage for vectorized ops
10+
- ONNX_INTER_OP_THREADS=4 # Higher inter-op for parallel matrix operations
11+
- ONNX_EXECUTION_MODE=parallel
12+
- ONNX_OPTIMIZATION_LEVEL=all
13+
- ONNX_MEMORY_PATTERN=true
14+
- ONNX_ARENA_EXTEND_STRATEGY=kNextPowerOfTwo
15+
- API_LOG_LEVEL=DEBUG
16+
restart: unless-stopped
17+
18+
openreader-webui:
19+
container_name: openreader-webui
20+
image: ghcr.io/richardr1126/openreader-webui:latest
21+
environment:
22+
- API_BASE=http://host.docker.internal:8880/v1
23+
ports:
24+
- "3003:3003"
25+
volumes:
26+
- docstore:/app/docstore
27+
restart: unless-stopped
28+
29+
volumes:
30+
docstore:

0 commit comments

Comments
 (0)