-
Notifications
You must be signed in to change notification settings - Fork 1
Description
Context: I'm implementing a read aloud navigator for Android, speaking the publication content independently from any visual rendition. Besides separation of concerns, this enables background playback without any view. As HTML parsing is not trivial and possibly time-consuming, I expect the guided navigation documents to be the only source of content available to the component, so it should contain all the data needed.
We used to use various location formats in the Readium toolkits in different contexts and some are still emerging within the scope of the annotation specification. In the case of the read aloud (specifically TTS), we currently and quite successfully use a combination of CSS selector and textual context. The processing algorithm is as follows:
- locate the first DOM element matching the CSS selector
- within the scope of this DOM element, identify the text chunk matching the textual context
As any navigator, the read aloud navigator has to expose locations to enable various features such as synchronization with and highlighting within a visual rendition, bookmarking, etc.
The guided navigation spec is far more restrictive regarding the location format than what we're used to. A node rendered with TTS contains two kinds of data: a text object, containing plain text or SSML and a textref URL. Currently an URL can contain two kinds of data for text publications: element fragments (pointing to HTML elements with IDs) and text fragments. According to the URL Fragment Text Directive spec, the second ones take precedence over the first ones, so this is not the same algorithm than the one we're using in Readium and it is not compatible. Besides, general CSS selectors are not supported, nor are DOM ranges or any other format.
Are we conformable with going towards using text fragments only? I can think of two possible issues with them: performance and copyright. It's far more efficient to locate a DOM element by its ID and try to match text inside it only. Concerning copyright, outputs from the read aloud navigator could be persisted or exported as bookmarks, enabling exporting the whole text.
If we're not, textref could contain structured JSON to allow the use of our own custom formats independently of the evolution of the URL fragment standards.