Add text preprocessing and dialogue generation for TTS#2
Merged
Conversation
…tness fixes Major improvements to the TTS podcast pipeline: - Add regex-based text preprocessing (preprocess.py): removes URLs, code blocks, citations, markdown artifacts; expands numbers, abbreviations, and currency to spoken words using num2words - Add LLM-based preprocessing modes: "llm" rewrites articles for natural audio narration, "dialogue" generates two-speaker podcast scripts for use with the Dia model (both via Claude Haiku) - Add Dia TTS backend (models/dia_tts.py) for multi-speaker podcast generation using [S1]/[S2] speaker tags - Upgrade Kokoro: bump from v0.7.16 to v0.9.4+, expand voice list from 4 to 12 (American + British English), add explicit repo_id - Upgrade ElevenLabs: output format from 32kbps to 128kbps, add eleven_v3 model option, include model ID in metadata - Add /setpreprocess Telegram command (none/regex/llm/dialogue modes) - Fix crash when extract_webpage_content returns None (bot.py) - Fix temp file concurrency: use tempfile for concat.txt, article files - Fix WAV file cleanup: always clean up in finally block - Remove unused html_fetcher.py and spacy dependency https://claude.ai/code/session_01UwZwqWxsZcVMcecNCfWCxj
no dialogues
e98b950 to
7cf8db1
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR adds comprehensive text preprocessing capabilities and LLM-based dialogue generation to improve audio quality for text-to-speech conversion. It introduces multiple preprocessing modes, a new Dia TTS model integration, and refactors file handling to use temporary files for better concurrency.
Key Changes
New Preprocessing System
preprocess.py: New module with regex-based text cleaning for TTS consumption
llm_preprocess.py: Claude-based article rewriting for natural audio narration
Bot Integration
/setpreprocesscommandFile Handling Improvements
extract_article.py: Refactored to use temporary files instead of fixed filenames
models/kokoro.py: Updated to use temporary files for concat operations
models/eleven.py: Added model selection support with AVAILABLE_MODELS dict
Dependencies
num2words>=0.5.14for number-to-word conversionkokoro>=0.9.4for improved compatibilityanthropic(implicit via llm modules) for Claude API accessspacy<=3.7.3dependencyTesting
Notable Implementation Details
https://claude.ai/code/session_01UwZwqWxsZcVMcecNCfWCxj