Skip to content

Commit 452bc2a

Browse files
committed
feat: Add issue triage and mapping documentation for OpenReader v1
- Created `issues-to-components.md` to summarize open issues and their mappings to components for v1 development. - Documented key features, bugs, and proposed solutions for issues #59, #48, #47, #44, and #40. - Outlined global guardrails and cross-cutting improvements for the v1 architecture. chore: Establish v1 todo and planning framework - Created `todo.md` to capture the 1.0 rewrite plan, including scope, architecture overview, phased milestones, and a master checklist. - Defined action items, dependency ordering, and issue mapping alignment for efficient development. chore: Update pnpm workspace configuration - Added `ignoredBuiltDependencies` for `canvas` in `pnpm-workspace.yaml` to prevent build issues.
1 parent 77c955b commit 452bc2a

File tree

6 files changed

+1123
-69
lines changed

6 files changed

+1123
-69
lines changed
Lines changed: 250 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,250 @@
1+
# ADR-0001 Playback architecture for v1
2+
3+
Status: Accepted
4+
Date: 2025-11-10
5+
Owner: @richardr1126
6+
Related:
7+
- Plan checklist [docs/v1/todo.md](docs/v1/todo.md)
8+
- Issue triage and mapping [docs/v1/issues-to-components.md](docs/v1/issues-to-components.md)
9+
10+
## Decision
11+
Adopt a new, single playback engine built around HTMLAudioElement with Media Source Extensions where available, replacing Howler entirely. Introduce a provider-agnostic TTS interface and document adapters that yield stable location tokens and sentence blocks. Replace the existing IndexedDB utility with Dexie.js for persistence and caching. Ship the new engine as a clean cutover without running dual engines.
12+
13+
## Context
14+
The current 0.x implementation couples TTS, viewers, and playback control in ways that create fragile flows and race conditions. Playback requires multiple edits across contexts to add features and is sensitive to timing between NLP, preloading, and Howler lifecycle. Issues highlight problems with large export downloads, dialog chunking, PDF margin extraction, and new feature support such as voice combination and chapter based exports.
15+
16+
Guiding constraints from v1 scope:
17+
- Streaming first playback
18+
- Replace Howler
19+
- Dexie.js as client storage layer
20+
- Preserve audiobook m4b and add chapter based MP3 export
21+
- Keep server side document sync
22+
- Browsers: Chrome, Firefox, Edge, Safari 16+
23+
24+
## Goals
25+
- Simplify the playback pipeline with a clear state machine and strict cancellation
26+
- Decouple document parsing from playback via adapters
27+
- Standardize provider integration behind a unified TTS interface
28+
- Improve resilience for long running operations and large audio artifacts
29+
- Make preloading, skipping, voice switching predictable and race free
30+
- Persist user state and caches using Dexie repositories
31+
32+
## Non goals
33+
- Running the legacy engine in parallel with v1
34+
- Rewriting existing viewers wholesale beyond adapter wiring and highlighting seams
35+
- Guaranteeing true streaming for providers that only return whole file responses
36+
37+
## Architecture overview
38+
39+
```mermaid
40+
flowchart TD
41+
Views[PDF viewer EPUB viewer HTML viewer] --> Adapters[Document adapters]
42+
Adapters --> Splitter[Sentence splitter and mapping]
43+
Splitter --> Queue[Sentence queue and preloader]
44+
Queue --> Engine[Playback engine state machine]
45+
Engine --> Media[Media controller HTMLAudioElement MSE]
46+
Media --> Output[Audio output media session background handling]
47+
Engine --> Cache[Audio cache Dexie]
48+
Engine --> TTS[TTS providers OpenAI Deepinfra Custom]
49+
Engine --> Position[Resume position store]
50+
```
51+
52+
## Component responsibilities
53+
54+
- Adapters
55+
- Yield text blocks plus stable locationToken
56+
- Handle next prev navigation semantics per format
57+
- Provide highlight mapping strategies
58+
- Files:
59+
- [src/v1/adapters/DocumentAdapter.ts](src/v1/adapters/DocumentAdapter.ts)
60+
- [src/v1/adapters/PdfAdapter.ts](src/v1/adapters/PdfAdapter.ts)
61+
- [src/v1/adapters/EpubAdapter.ts](src/v1/adapters/EpubAdapter.ts)
62+
- [src/v1/adapters/HtmlAdapter.ts](src/v1/adapters/HtmlAdapter.ts)
63+
64+
- NLP splitter
65+
- Builds sentence blocks with quote aware grouping
66+
- Exposes mapping to raw sentences for highlighting
67+
- Files:
68+
- [src/v1/nlp/sentences.ts](src/v1/nlp/sentences.ts)
69+
- Uses [src/utils/nlp.ts](src/utils/nlp.ts:1)
70+
71+
- Playback engine
72+
- Drives state transitions, cancellation, preloading, and error handling
73+
- Integrates with MediaController and TTS providers
74+
- Files:
75+
- [src/v1/playback/state.ts](src/v1/playback/state.ts)
76+
- [src/v1/playback/queue.ts](src/v1/playback/queue.ts)
77+
- [src/v1/playback/engine.ts](src/v1/playback/engine.ts)
78+
- [src/v1/playback/hooks/usePlayback.ts](src/v1/playback/hooks/usePlayback.ts)
79+
80+
- Media controller
81+
- Owns HTMLAudioElement lifecycle and Media Source Extensions when supported
82+
- Provides blob fallback and gapless segment chaining for Safari 16 plus
83+
- Integrates media session and background visibility behaviors
84+
- Files:
85+
- [src/v1/playback/media/MediaController.ts](src/v1/playback/media/MediaController.ts)
86+
- [src/v1/playback/media/mediaSession.ts](src/v1/playback/media/mediaSession.ts)
87+
- [src/v1/playback/media/background.ts](src/v1/playback/media/background.ts)
88+
89+
- TTS providers
90+
- Unified interface for synth requests and voice listing
91+
- Pass through custom voice strings including plus syntax when supported
92+
- Files:
93+
- [src/v1/tts/types.ts](src/v1/tts/types.ts)
94+
- [src/v1/tts/Provider.ts](src/v1/tts/Provider.ts)
95+
- [src/v1/tts/providers/OpenAIProvider.ts](src/v1/tts/providers/OpenAIProvider.ts)
96+
- [src/v1/tts/providers/DeepinfraProvider.ts](src/v1/tts/providers/DeepinfraProvider.ts)
97+
- [src/v1/tts/providers/CustomOpenAIProvider.ts](src/v1/tts/providers/CustomOpenAIProvider.ts)
98+
- [src/v1/tts/voices.ts](src/v1/tts/voices.ts)
99+
100+
- Persistence and caching
101+
- Dexie schema for documents, config, audio cache, positions, voices
102+
- Repositories expose typed APIs and transactions
103+
- Files:
104+
- [src/v1/db/schema.ts](src/v1/db/schema.ts)
105+
- [src/v1/db/client.ts](src/v1/db/client.ts)
106+
- [src/v1/db/repositories/DocumentsRepo.ts](src/v1/db/repositories/DocumentsRepo.ts)
107+
- [src/v1/db/repositories/ConfigRepo.ts](src/v1/db/repositories/ConfigRepo.ts)
108+
- [src/v1/db/repositories/AudioCacheRepo.ts](src/v1/db/repositories/AudioCacheRepo.ts)
109+
- [src/v1/db/repositories/VoicesRepo.ts](src/v1/db/repositories/VoicesRepo.ts)
110+
- [src/v1/playback/positionStore.ts](src/v1/playback/positionStore.ts)
111+
112+
- API surface
113+
- Streaming route for providers that support chunked responses
114+
- Range enabled audio download for large m4b artifacts
115+
- Files:
116+
- [src/app/api/tts/stream/route.ts](src/app/api/tts/stream/route.ts)
117+
- [src/app/api/tts/route.ts](src/app/api/tts/route.ts:1)
118+
- [src/app/api/tts/voices/route.ts](src/app/api/tts/voices/route.ts:1)
119+
- [src/app/api/audio/convert/route.ts](src/app/api/audio/convert/route.ts:1)
120+
121+
## Playback state machine
122+
123+
States
124+
- idle
125+
- preparing
126+
- buffering
127+
- playing
128+
- paused
129+
- stopping
130+
- error
131+
132+
Transitions
133+
- idle -> preparing on play with valid queue head
134+
- preparing -> buffering after first audio segment request
135+
- buffering -> playing on enough data available
136+
- playing -> buffering when underflow or on skip voice change
137+
- playing -> paused on user pause
138+
- any -> stopping on stop clear queue cancel requests
139+
- any -> error on unrecoverable error with context
140+
141+
Guards and effects
142+
- All requests carry AbortController scoped to the current token
143+
- Config changes produce a new token and cancel in flight
144+
- Preloading is capped and respects cache budgets
145+
146+
## Media pipeline
147+
148+
- Try MSE with a SourceBuffer of audio mpeg or aac when available
149+
- Else use short blob segments and chain playback with minimal gaps
150+
- Apply rate changes via playbackRate for audio player speed separate from voice speed at synth time
151+
- Integrate Media Session actions play pause next previous
152+
- Pause on background visibility and auto resume on foreground if user was playing
153+
154+
## Text and highlighting
155+
156+
- Adapters provide raw to processed sentence mapping for highlight
157+
- PDF adapter normalizes x positions to page width and respects left right margins
158+
- EPUB adapter yields location tokens and section navigation
159+
- HTML adapter passes text and uses markdown rendering only for view
160+
161+
## Dexie schema outline
162+
163+
Tables and indicative indexes
164+
- documents id type name lastModified size dataRef
165+
- config key value
166+
- audioCache key createdAt expiresAt size bytesRef or chunkRefs
167+
- positions docId locationToken sentenceIndex updatedAt
168+
- voices provider model voices updatedAt
169+
170+
Exact table definitions will be codified in [src/v1/db/schema.ts](src/v1/db/schema.ts)
171+
172+
## API notes
173+
174+
- TTS stream route
175+
- POST returns chunked audio where provider supports it
176+
- Fallback to full array buffer with progressive delivery
177+
- Audio convert route
178+
- Supports mp3 per chapter mode and m4b
179+
- Adds GET download with Accept Ranges for large files
180+
181+
References:
182+
- Current TTS route [src/app/api/tts/route.ts](src/app/api/tts/route.ts:1)
183+
- Current voices route [src/app/api/tts/voices/route.ts](src/app/api/tts/voices/route.ts:1)
184+
- Current audio convert [src/app/api/audio/convert/route.ts](src/app/api/audio/convert/route.ts:1)
185+
186+
## Migration plan
187+
188+
- One time importer reads from legacy store helpers in [src/utils/indexedDB.ts](src/utils/indexedDB.ts:1) and writes to Dexie
189+
- Progress UI and retryable steps
190+
- After cutover remove legacy modules and dependencies including Howler
191+
192+
## Issue alignment
193+
194+
- #59 chapter mp3 export via chapterized pipeline and streamed zip
195+
- #48 large m4b download via range enabled download endpoint and persistent temp artifacts
196+
- #47 voice combination via free form voice string pass through on Deepinfra and custom
197+
- #44 dialog chunking via quote aware grouping in splitter
198+
- #40 pdf margins via normalized x width and better width fallback
199+
200+
See details in [docs/v1/issues-to-components.md](docs/v1/issues-to-components.md)
201+
202+
## Alternatives considered
203+
204+
- Keep Howler and harden with retries
205+
- Rejected due to continued complexity and limited streaming control
206+
- Keep raw IndexedDB helper
207+
- Rejected due to ergonomics, schema evolution, and repo patterns desired
208+
- Dual engine migration
209+
- Rejected to avoid complexity and surface area during refactor
210+
211+
## Risks and mitigations
212+
213+
- MSE availability and Safari variance
214+
- Provide blob segment fallback and small segment chaining
215+
- Provider streaming differences
216+
- Design stream route with capability detection and fallbacks
217+
- Large artifact memory pressure
218+
- Range enabled downloads and file backed buffers where possible
219+
- Cache growth
220+
- Dexie TTL LRU and size budget enforcement with telemetry
221+
222+
## Rollout
223+
224+
- Alpha
225+
- HTML adapter wired end to end with engine and streaming
226+
- Basic Dexie schema and caches
227+
- Beta
228+
- PDF and EPUB adapters with highlighting and resume
229+
- Chapter mp3 export
230+
- Migration UI
231+
- GA
232+
- m4b and sync hardened
233+
- E2E and performance checks
234+
- Legacy removal
235+
236+
## Acceptance criteria
237+
238+
- Streaming start to speech under reasonable latency on cached sentences
239+
- Voice change mid playback cancels and resumes with a single buffer rebuild
240+
- 1 to 2 GB m4b export downloads stably in Docker with Range support
241+
- Chapter zip exports are correct and stream without UI stalls
242+
- Dialog is chunked appropriately without regressing non dialog cases
243+
- PDF margins trimming is reliable across test samples
244+
245+
## Next actions
246+
247+
- Finalize checklist and sequencing in [docs/v1/todo.md](docs/v1/todo.md)
248+
- Create v1 code skeleton and Dexie schema
249+
- Implement engine state machine and MediaController baseline
250+
- Wire HTML adapter and stream route for first alpha milestone

0 commit comments

Comments
 (0)