Modified script based on Chris Bremseth's awesome epub-to-speech script.
This tool allows you to extract content from EPUB files and convert it to speech using OpenAI's Text-to-Speech API. This version has some modifications that make it suitable for the ORM/HTMLBook use case.
- Extract content and headers from EPUB files and save as Markdown
- Convert Markdown content to speech files using OpenAI's TTS API
- Process EPUBs to speech in a single command
- Customizable voice selection
- Support for large files by chunking them and then reassembling the resulting audio
- Suitable for HTMLBook specifically: Only chapter-level sections with specific data-types are retained ('chapter', 'preface', etc.)
- Headings after the first in a chapter are promoted, to keep chapter content together
- Support for OpenAI, Google Cloud, and Azure TTS services
- Support for limited Speech Markdown features and conversion to SSML (for use with Azure TTS service)
- Python 3.7+
- OpenAI API key
- ffmpeg or libav (used by pydub to edit/combine the MP3s)
- Clone the repository or download the source files:
git clone https://github.com/cbremseth/epub-to-speech.git
cd epub-to-speech
- Install required dependencies:
pip install -r requirements.txt
- Create a
.env
file in the root directory to store the credentials for the TTS services you'll use, as applicable:
# OpenAI
echo "OPENAI_API_KEY=sk-your-api-key-here" >> .env
# Google
# 1. obtain and save key.json to project
# 2. then save path as an env variable:
echo "GOOGLE_APPLICATION_CREDENTIALS=/path/to/key_file" >> .env
# Azure
echo "SPEECH_KEY=your-key-here" >> .env
echo "SPEECH_REGION=your-region" >> .env
- Install ffmpeg or libav on your system. On mac, you can use brew to install ffmpeg:
brew install ffmpeg
The tool provides a command-line interface with three main commands:
python main.py extract path/to/book.epub --output book.md
Options:
--output
,-o
: Output markdown filename (default: same as input with .md extension)--replace-stripped-elements-with-comments', '-c'
: When stripping out unwanted elements from the EPUB HTML (e.g., images, pre blocks, etc.), insert a comment where the elements have been removed. Default is to simply remove the elements.
python main.py speak path/to/file.md --output-dir ./audio_files --voice nova
Options:
--output-dir
,-o
: Directory for audio output files (default: ./audio_output)--voice
,-v
: Voice to use (OpenAI: alloy, echo, fable, onyx, nova, shimmer; Google: female, male; Azure: cora, adam, nancy, emma, jane, jason, davis, samuel)--split-at-subheadings
,-s
: Split audio files by subheadings (all H1 and H2) instead of the default of chapter-level audio files (H1)--use-ssml
,-u
: Convert chunked Markdown content to SSML before passing to TTS service. At this time, compatible with Azure service only. Limited Speech Markdown conventions supported.
python main.py process path/to/book.epub --output-dir ./audio_files --voice alloy --keep-markdown
Options:
--output-dir
,-o
: Directory for audio output files (default: ./audio_output)--voice
,-v
: Voice to use (options: alloy, echo, fable, onyx, nova, shimmer)--keep-markdown
,-k
: Keep the intermediate markdown file (default: removed after processing)--split-at-subheadings
,-s
: Split audio files by subheadings (all H1 and H2) instead of the default of chapter-level audio files (H1)
# Process "The Great Gatsby" to audio files with the "nova" voice
python main.py process books/great_gatsby.epub --output-dir ./gatsby_audio --voice nova --keep-markdown
main.py
: Command-line interface using Clickepub_processor.py
: EPUB to Markdown conversion functionsspeech_generator.py
: Markdown to speech conversion using TTS serviceaudio_concatenator.py
: Combines audio files that represent parts of a sectionspeech_services.py
: Custom classes for the available TTS servicesrequirements.txt
: List of required Python packages
- click: Command-line interface creation
- ebooklib: EPUB file processing
- beautifulsoup4: HTML parsing
- markdown: Markdown processing
- openai: OpenAI API client
- python-dotenv: Environment variable management
- markdown_to_ssml_converter: Markdown to SSML conversion, with limited support for Speech Markdown conventions
- ffmpeg or libav (non-python dependency): Crossplatform multimedia framework
- OpenAI's TTS API has a 4096 character limit per request, so long sections are split
- API rate limits may apply when processing large books
- Some EPUB formatting may not translate perfectly to Markdown