richardr1126
diff --git a/‎.github/workflows/playwright.yml‎
Lines changed: 2 additions & 2 deletions b/‎.github/workflows/playwright.yml‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎.npmrc‎
Lines changed: 3 additions & 0 deletions b/‎.npmrc‎
Lines changed: 3 additions & 0 deletions
diff --git a/‎Dockerfile‎
Lines changed: 7 additions & 4 deletions b/‎Dockerfile‎
Lines changed: 7 additions & 4 deletions
diff --git a/‎README.md‎
Lines changed: 77 additions & 44 deletions b/‎README.md‎
Lines changed: 77 additions & 44 deletions
diff --git a/‎examples/docker-compose.yml‎
Lines changed: 30 additions & 0 deletions b/‎examples/docker-compose.yml‎
Lines changed: 30 additions & 0 deletions
@@ -24,8 +24,8 @@ jobs:
     - name: Run Playwright tests
       env:
         NEXT_PUBLIC_NODE_ENV: test
-        API_BASE: https://tts.richardr.dev/v1
-        API_KEY: not-needed
+        API_BASE: https://api.deepinfra.com/v1/openai
+        API_KEY: ${{ secrets.DEEPINFRA_API_KEY }}
       run: npx playwright test --reporter=list,github,html
     - uses: actions/upload-artifact@v4
       if: ${{ !cancelled() }}
 
@@ -0,0 +1,3 @@
+# pnpm configuration
+auto-install-peers=true
+strict-peer-dependencies=false
@@ -4,23 +4,26 @@ FROM node:current-alpine
 # Add ffmpeg and libreoffice using Alpine package manager
 RUN apk add --no-cache ffmpeg libreoffice-writer
 
+# Install pnpm globally
+RUN npm install -g pnpm
+
 # Create app directory
 WORKDIR /app
 
 # Copy package files
-COPY package*.json ./
+COPY package.json pnpm-lock.yaml ./
 
 # Install dependencies
-RUN npm install
+RUN pnpm install --frozen-lockfile
 
 # Copy project files
 COPY . .
 
 # Build the Next.js application
-RUN npm run build
+RUN pnpm run build
 
 # Expose the port the app runs on
 EXPOSE 3003
 
 # Start the application
-CMD ["npm", "start"]
+CMD ["pnpm", "start"]
@@ -9,75 +9,83 @@
 
 # OpenReader WebUI 📄🔊
 
-OpenReader WebUI is a document reader with Text-to-Speech capabilities, offering a TTS read along experience with narration for EPUB, PDF, TXT, MD, and DOCX documents. It can use any OpenAI compatible TTS endpoint, including [Kokoro-FastAPI](https://github.com/remsky/Kokoro-FastAPI) and [Orpheus-FastAPI](https://github.com/Lex-au/Orpheus-FastAPI)
+OpenReader WebUI is a document reader with Text-to-Speech capabilities, offering a TTS read along experience with narration for EPUB, PDF, TXT, MD, and DOCX documents. It supports multiple TTS providers including OpenAI, Deepinfra, and custom OpenAI-compatible endpoints like [Kokoro-FastAPI](https://github.com/remsky/Kokoro-FastAPI) and [Orpheus-FastAPI](https://github.com/Lex-au/Orpheus-FastAPI)
 
-- 🎯 **TTS API Integration**: 
-  - Compatible with OpenAI text to speech API and GPT-4o Mini TTS, Kokoro-FastAPI TTS, Orpheus FastAPI or any other compatible service
-  - Support for TTS models (tts-1, tts-1-hd, gpt-4o-mini-tts, kokoro, and custom)
+- 🎯 **Multi-Provider TTS Support**: 
+  - **OpenAI**: tts-1, tts-1-hd, gpt-4o-mini-tts models with voices (alloy, echo, fable, onyx, nova, shimmer)
+  - **Deepinfra**: Kokoro-82M, Orpheus-3B, Sesame-1B models with extensive voice libraries
+  - **Custom OpenAI-Compatible**: Any OpenAI-compatible endpoint with custom voice sets
+  - Provider-specific voice management with automatic voice restoration per provider-model combination
 - 💾 **Local-First Architecture**: Uses IndexedDB browser storage for documents
 - 🛜 **Optional Server-side documents**: Manually upload documents to the next backend for all users to download
 - 📖 **Read Along Experience**: Follow along with highlighted text as the TTS narrates
 - 📄 **Document formats**: EPUB, PDF, TXT, MD, DOCX (with libreoffice installed)
 - 🎧 **Audiobook Creation**: Create and export audiobooks from PDF and ePub files **(in m4b format with ffmpeg and aac TTS output)**
-- 📲 **Mobile Support**: Works on mobile devices, and can be added as a PWA web app
 - 🎨 **Customizable Experience**: 
-  - 🔑 Set TTS API base URL (and optional API key)
-  - 🎯 Set model-specific instructions for GPT-4o Mini TTS
-  - 🏎️ Adjustable playback speed
-  - 📐 Customize PDF text extraction margins
-  - 🗣️ Multiple voice options (checks `/v1/audio/voices` endpoint)
+  - 🔑 Select TTS provider (OpenAI, Deepinfra, or Custom OpenAI-compatible)
+  - 🔐 Set TTS API base URL and optional API key
   - 🎨 Multiple app theme options
+  - And more...
 
 ### 🛠️ Work in progress
 - [ ] **Native .docx support** (currently requires libreoffice)
-- [ ] **Support non-OpenAI TTS APIs**: ElevenLabs, etc.
 - [ ] **Accessibility Improvements**
 
 ## 🐳 Docker Quick Start
 
 ### Prerequisites
 - Recent version of Docker installed on your machine
-- A TTS API server (Kokoro-FastAPI, Orpheus-FastAPI, or OpenAI API)
-
-```bash
-docker run --name openreader-webui \
-  -p 3003:3003 \
-  -v openreader_docstore:/app/docstore \
-  ghcr.io/richardr1126/openreader-webui:latest
-```
-
-(Optionally): Set the TTS `API_BASE` URL and/or `API_KEY` to be default for all devices
-```bash
-docker run --name openreader-webui \
-  -e API_BASE=http://host.docker.internal:8880/v1 \
-  -p 3003:3003 \
-  -v openreader_docstore:/app/docstore \
-  ghcr.io/richardr1126/openreader-webui:latest
-```
-
-> Requesting audio from the TTS API happens on the Next.js server not the client. So the base URL for the TTS API should be accessible and relative to the Next.js server. If it is in a Docker you may need to use `host.docker.internal` to access the host machine, instead of `localhost`.
-
-Visit [http://localhost:3003](http://localhost:3003) to run the app and set your settings.
-
-> **Note:** The `openreader_docstore` volume is used to store server-side documents. You can mount a local directory instead. Or remove it if you don't need server-side documents.
-
-### ⬆️ Update Docker Image
+- A TTS API server (Kokoro-FastAPI, Orpheus-FastAPI, Deepinfra, OpenAI, etc.) running and accessible
+
+### 1. 🐳 Start the Docker container:
+  ```bash
+  docker run --name openreader-webui \
+    -p 3003:3003 \
+    -v openreader_docstore:/app/docstore \
+    ghcr.io/richardr1126/openreader-webui:latest
+  ```
+
+  (Optionally): Set the TTS `API_BASE` URL and/or `API_KEY` to be default for all devices
+  ```bash
+  docker run --name openreader-webui \
+    -e API_KEY=none \
+    -e API_BASE=http://host.docker.internal:8880/v1 \
+    -p 3003:3003 \
+    -v openreader_docstore:/app/docstore \
+    ghcr.io/richardr1126/openreader-webui:latest
+  ```
+
+  > **Note:** Requesting audio from the TTS API happens on the Next.js server not the client. So the base URL for the TTS API should be accessible and relative to the Next.js server. If it is in a Docker you may need to use `host.docker.internal` to access the host machine, instead of `localhost`.
+
+  Visit [http://localhost:3003](http://localhost:3003) to run the app and set your settings.
+
+  > **Note:** The `openreader_docstore` volume is used to store server-side documents. You can mount a local directory instead. Or remove it if you don't need server-side documents.
+
+### 2. ⚙️ Configure the app settings in the UI:
+  - Set the TTS Provider and Model in the Settings modal
+  - Set the TTS API Base URL and API Key if needed (more secure to set in env vars)
+  - Select your model's voice from the dropdown (voices try to be fetched from TTS Provider API)
+
+### 3. ⬆️ Updating Docker Image
 ```bash
 docker stop openreader-webui && \
 docker rm openreader-webui && \
 docker pull ghcr.io/richardr1126/openreader-webui:latest
 ```
 
-### Adding to a Docker Compose (i.e. with open-webui or Kokoro-FastAPI)
+### (Alternate) 🐳 Configuration with Docker Compose and Kokoro-FastAPI
 
-> Note: This is an example of how to add OpenReader WebUI to a docker-compose file. You can add it to your existing docker-compose file or create a new one in this directory. Then run `docker-compose up --build` to start the services.
+A complete example docker-compose file with Kokoro-FastAPI and OpenReader WebUI is available in [`examples/docker-compose.yml`](examples/docker-compose.yml). You can download and use it:
 
+```bash
+mkdir -p openreader-compose
+cd openreader-compose
+curl -O https://raw.githubusercontent.com/richardr1126/OpenReader-WebUI/main/examples/docker-compose.yml
+docker compose up -d
+```
 
-Create or add to a `docker-compose.yml`:
+Or add OpenReader WebUI to your existing `docker-compose.yml`:
 ```yaml
-volumes:
-  docstore:
-
 services:
   openreader-webui:
     container_name: openreader-webui
@@ -89,12 +97,15 @@ services:
     volumes:
       - docstore:/app/docstore
     restart: unless-stopped
+
+volumes:
+  docstore:
 ```
 
 ## Dev Installation
 
 ### Prerequisites
-- Node.js & npm (recommended: use [nvm](https://github.com/nvm-sh/nvm))
+- Node.js & npm or pnpm (recommended: use [nvm](https://github.com/nvm-sh/nvm) for Node.js)
 Optionally required for different features:
 - [FFmpeg](https://ffmpeg.org) (required for audiobook m4b creation only)
   - On Linux: `sudo apt install ffmpeg`
@@ -112,6 +123,13 @@ Optionally required for different features:
    ```
 
 2. Install dependencies:
+   
+   With pnpm (recommended):
+   ```bash
+   pnpm install
+   ```
+   
+   Or with npm:
    ```bash
    npm install
    ```
@@ -124,11 +142,26 @@ Optionally required for different features:
    > Note: The base URL for the TTS API should be accessible and relative to the Next.js server
 
 4. Start the development server:
+   
+   With pnpm (recommended):
+   ```bash
+   pnpm dev
+   ```
+   
+   Or with npm:
    ```bash
    npm run dev
    ```
 
    or build and run the production server:
+   
+   With pnpm:
+   ```bash
+   pnpm build
+   pnpm start
+   ```
+   
+   Or with npm:
    ```bash
    npm run build
    npm start
@@ -183,9 +216,9 @@ This project would not be possible without standing on the shoulders of these gi
   - [Headless UI](https://headlessui.com)
   - [@tailwindcss/typography](https://tailwindcss.com/docs/typography-plugin)
 - **TTS:** (tested on)
+  - [Deepinfra API](https://deepinfra.com) (Kokoro-82M, Orpheus-3B, Sesame-1B)
   - [Kokoro FastAPI TTS](https://github.com/remsky/Kokoro-FastAPI/tree/v0.0.5post1-stable)
   - [Orpheus FastAPI TTS](https://github.com/Lex-au/Orpheus-FastAPI)
-  - [OpenAI API](https://platform.openai.com/docs/api-reference/text-to-speech)
 - **NLP:** [compromise](https://github.com/spencermountain/compromise) NLP library for sentence splitting
 
 ## License
 
@@ -0,0 +1,30 @@
+services:
+  kokoro-tts:
+    container_name: kokoro-tts
+    image: ghcr.io/remsky/kokoro-fastapi-cpu:v0.2.4
+    ports:
+      - "8880:8880"
+    environment:
+      # ONNX Optimization Settings for vectorized operations
+      - ONNX_NUM_THREADS=8  # Maximize core usage for vectorized ops
+      - ONNX_INTER_OP_THREADS=4  # Higher inter-op for parallel matrix operations
+      - ONNX_EXECUTION_MODE=parallel
+      - ONNX_OPTIMIZATION_LEVEL=all
+      - ONNX_MEMORY_PATTERN=true
+      - ONNX_ARENA_EXTEND_STRATEGY=kNextPowerOfTwo
+      - API_LOG_LEVEL=DEBUG
+    restart: unless-stopped
+
+  openreader-webui:
+    container_name: openreader-webui
+    image: ghcr.io/richardr1126/openreader-webui:latest
+    environment:
+      - API_BASE=http://host.docker.internal:8880/v1
+    ports:
+      - "3003:3003"
+    volumes:
+      - docstore:/app/docstore
+    restart: unless-stopped
+
+volumes:
+  docstore:
Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,3 @@`
	`1`	`+# pnpm configuration`
	`2`	`+auto-install-peers=true`
	`3`	`+strict-peer-dependencies=false`