Add text preprocessing and dialogue generation for TTS by razumau · Pull Request #2 · razumau/tts_podcast

razumau · 2026-02-20T18:45:09Z

Summary

This PR adds comprehensive text preprocessing capabilities and LLM-based dialogue generation to improve audio quality for text-to-speech conversion. It introduces multiple preprocessing modes, a new Dia TTS model integration, and refactors file handling to use temporary files for better concurrency.

Key Changes

New Preprocessing System

preprocess.py: New module with regex-based text cleaning for TTS consumption
- Removes URLs, emails, citation markers, code blocks, and markdown formatting
- Expands abbreviations (e.g., "e.g." → "for example") and numbers to words
- Handles currency, percentages, and year detection
- Removes HTML entities and normalizes whitespace
llm_preprocess.py: Claude-based article rewriting for natural audio narration
- Converts written text to spoken language with proper flow
- Removes visual references and technical elements

Bot Integration

bot.py: Added preprocessing mode selection via /setpreprocess command
- Four modes: "none", "regex", "llm"
- Applies selected preprocessing before TTS generation
- Displays preprocessing mode in episode descriptions

File Handling Improvements

extract_article.py: Refactored to use temporary files instead of fixed filenames
- Prevents concurrency issues when multiple requests are processed
- Properly cleans up temp files in finally block
models/kokoro.py: Updated to use temporary files for concat operations
- Improved cleanup logic with existence checks
models/eleven.py: Added model selection support with AVAILABLE_MODELS dict

Dependencies

Added num2words>=0.5.14 for number-to-word conversion
Updated kokoro>=0.9.4 for improved compatibility
Added anthropic (implicit via llm modules) for Claude API access
Removed spacy<=3.7.3 dependency

Testing

tests/test_preprocess.py: Comprehensive test suite for preprocessing pipeline
- Tests individual cleaning functions and full pipeline
- Validates abbreviation expansion, number conversion, and markdown removal

Notable Implementation Details

Temporary files use unique suffixes to avoid collisions in concurrent scenarios
Claude Haiku model used for cost-effective LLM preprocessing

https://claude.ai/code/session_01UwZwqWxsZcVMcecNCfWCxj

…tness fixes Major improvements to the TTS podcast pipeline: - Add regex-based text preprocessing (preprocess.py): removes URLs, code blocks, citations, markdown artifacts; expands numbers, abbreviations, and currency to spoken words using num2words - Add LLM-based preprocessing modes: "llm" rewrites articles for natural audio narration, "dialogue" generates two-speaker podcast scripts for use with the Dia model (both via Claude Haiku) - Add Dia TTS backend (models/dia_tts.py) for multi-speaker podcast generation using [S1]/[S2] speaker tags - Upgrade Kokoro: bump from v0.7.16 to v0.9.4+, expand voice list from 4 to 12 (American + British English), add explicit repo_id - Upgrade ElevenLabs: output format from 32kbps to 128kbps, add eleven_v3 model option, include model ID in metadata - Add /setpreprocess Telegram command (none/regex/llm/dialogue modes) - Fix crash when extract_webpage_content returns None (bot.py) - Fix temp file concurrency: use tempfile for concat.txt, article files - Fix WAV file cleanup: always clean up in finally block - Remove unused html_fetcher.py and spacy dependency https://claude.ai/code/session_01UwZwqWxsZcVMcecNCfWCxj

no dialogues

claude and others added 3 commits February 20, 2026 18:21

remove unnecessary options and libraries

ef6a73d

no dialogues

minor review fixes

7cf8db1

razumau force-pushed the claude/review-app-improvements-Eso8v branch from e98b950 to 7cf8db1 Compare March 1, 2026 22:31

razumau merged commit dbd1400 into main Mar 2, 2026
1 check passed

razumau deleted the claude/review-app-improvements-Eso8v branch March 2, 2026 20:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add text preprocessing and dialogue generation for TTS#2

Add text preprocessing and dialogue generation for TTS#2
razumau merged 3 commits intomainfrom
claude/review-app-improvements-Eso8v

razumau commented Feb 20, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

razumau commented Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Key Changes

New Preprocessing System

Bot Integration

File Handling Improvements

Dependencies

Testing

Notable Implementation Details

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

razumau commented Feb 20, 2026 •

edited

Loading