Skip to content

Commit 504d856

Browse files
committed
Adds AI-powered script analysis and DOCX document generation.
1 parent c483e7e commit 504d856

File tree

9 files changed

+361
-11
lines changed

9 files changed

+361
-11
lines changed

.env.example

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,3 +9,24 @@ GEMINI_API_KEY=your_gemini_api_key_here
99

1010
# Default TTS provider elevenlabs or gemini
1111
DEFAULT_TTS_PROVIDER="elevenlabs"
12+
13+
# Gemini model for TTS generation (optional, defaults to gemini-2.5-flash-preview-tts)
14+
GEMINI_TTS_MODEL=gemini-2.5-flash-preview-tts
15+
16+
# Gemini model for transcript analysis (optional, defaults to gemini-2.5-flash)
17+
GEMINI_ANALYSIS_MODEL=gemini-2.5-flash
18+
19+
# Analysis Configuration
20+
# ---------------------
21+
# To enable the "Generate DOCX Analysis" button in the web interface:
22+
# 1. Ensure GEMINI_API_KEY is configured above
23+
# 2. Create a file named 'analysis_prompt.txt' in the './config' directory
24+
# (for Docker) or in your app data directory (for desktop app)
25+
# 3. The prompt file should contain the instructions for the Gemini AI
26+
# to analyze podcast transcripts
27+
#
28+
# Example location:
29+
# - Docker/Local: ./config/analysis_prompt.txt
30+
# - macOS: ~/Library/Application Support/PodcastGenerator/analysis_prompt.txt
31+
# - Windows: %APPDATA%/PodcastGenerator/analysis_prompt.txt
32+
# - Linux: ~/.config/PodcastGenerator/analysis_prompt.txt

.gitignore

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,3 +25,9 @@ instance/*
2525

2626
# Environment variables (contains API keys)
2727
.env
28+
29+
# Config directory (user customization)
30+
config/*
31+
!config/.gitkeep
32+
!config/*.example
33+
!config/README.md

CHANGELOG.md

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,27 @@ All notable changes to this project will be documented in this file.
44

55
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).
66

7+
## [2.0.0b27]
8+
9+
### Added
10+
- **AI-Powered Script Analysis**: New web interface feature for generating DOCX documents with AI analysis
11+
- Generates comprehensive analysis including summaries, character lists, locations, and themes
12+
- Creates comprehension questions for multiple language levels (A1, A1+/A2, A2+/B1)
13+
- Perfect for language teachers and educational content creators
14+
- Configurable via `config/analysis_prompt.txt` file
15+
- Button appears automatically when both Gemini API key and prompt file are configured
16+
- Files are named based on script content instead of random UUIDs
17+
- **Configuration Directory**: New `config/` directory for user-customizable settings
18+
- Analysis prompt templates stored in `config/analysis_prompt.txt`
19+
- Example file provided: `config/analysis_prompt.txt.example`
20+
- Priority order: config dir → app data dir → bundled assets
21+
- Fully compatible with Docker volume mounts
22+
- **Dependencies**: Added `python-docx` for DOCX document generation
23+
24+
### Changed
25+
- **Filename Generation**: Both podcast MP3 and analysis DOCX files now use the beginning of the first sentence as filename instead of random UUIDs
26+
- **Environment Variables**: Added `GEMINI_ANALYSIS_MODEL` for configuring the Gemini model used for analysis (defaults to `gemini-2.5-flash`)
27+
728
## [2.0.0b26]
829

930
### Changed

README.md

Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,7 @@ See also the [French README](docs/README-fr.md) for a version in French.
2828

2929
- **Modern UI**: A clean, modern, and responsive interface built with `customtkinter` that adapts to your system's light or dark mode.
3030
- **Dual TTS Provider**: Choose between the high-quality voices of **Google Gemini** or **ElevenLabs**.
31+
- **AI-Powered Script Analysis**: Generate DOCX documents with AI-powered analysis of your podcast scripts, including summaries, comprehension questions for different language levels (A1, A2, B1), and key educational insights. Perfect for language teachers and content creators.
3132
- **Synchronized HTML Demo**: Automatically generate a shareable HTML page with your podcast audio and a synchronized, highlighted transcript.
3233
- **Flexible Formats**: Export your creations in **MP3** (default) or **WAV** formats.
3334
- **Customization**: Configure and save voices for each speaker in your scripts, with options for language and accent.
@@ -98,6 +99,61 @@ D’écritoire, monsieur, ou de boîte à ciseaux ? »
9899

99100
💡 Note on Annotations: The app uses square brackets [emotion] for ElevenLabs' emotional cues. If you use Gemini, the app will automatically convert them to parentheses (emotion) for you.
100101

102+
---
103+
104+
## 📝 AI-Powered Script Analysis (Web Interface)
105+
106+
The web interface includes an optional AI-powered analysis feature that generates professional DOCX documents analyzing your podcast scripts. This feature is particularly useful for **language teachers**, **content creators**, and **educational material developers**.
107+
108+
### What's Included in the Analysis
109+
110+
The generated DOCX document contains:
111+
- **Summary**: A concise overview of the podcast content
112+
- **Main Characters**: Key speakers and personalities mentioned
113+
- **Key Locations**: Important places referenced in the script
114+
- **Central Theme**: The main message or topic
115+
- **Comprehension Questions**: Tailored questions for different language proficiency levels:
116+
- A1 (Beginner)
117+
- A1+/A2 (Elementary)
118+
- A2+/B1 (Intermediate)
119+
120+
### Setup Instructions
121+
122+
To enable this feature in the web interface:
123+
124+
1. **Configure Gemini API Key**
125+
Add your Gemini API key to the `.env` file:
126+
```bash
127+
GEMINI_API_KEY=your_actual_key_here
128+
```
129+
130+
2. **Create Analysis Prompt File**
131+
Copy the example prompt configuration:
132+
```bash
133+
cp config/analysis_prompt.txt.example config/analysis_prompt.txt
134+
```
135+
136+
3. **Customize the Prompt (Optional)**
137+
Edit `config/analysis_prompt.txt` to modify how the AI analyzes your scripts. You can adjust:
138+
- The types of questions generated
139+
- Language levels targeted
140+
- Analysis depth and focus areas
141+
- Output formatting preferences
142+
143+
4. **Access the Feature**
144+
Once configured, a purple "Generate DOCX Analysis" button will appear next to the "Generate Podcast" button in the web interface.
145+
146+
### File Locations
147+
148+
- **Docker**: `./config/analysis_prompt.txt`
149+
- **macOS**: `~/Library/Application Support/PodcastGenerator/analysis_prompt.txt`
150+
- **Windows**: `%APPDATA%/PodcastGenerator/analysis_prompt.txt`
151+
- **Linux**: `~/.config/PodcastGenerator/analysis_prompt.txt`
152+
153+
For more details, see the `config/README.md` file.
154+
155+
---
156+
101157
## 📦 Installation
102158

103159
### 1. Required Dependency: FFmpeg

app.py

Lines changed: 103 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33
from utils import sanitize_text, get_asset_path, get_app_data_dir
44
from config import AVAILABLE_VOICES, DEFAULT_APP_SETTINGS, DEMO_AVAILABLE
55
from create_demo import create_html_demo_whisperx
6+
from transcript_analyzer import generate_analysis_docx, get_analysis_prompt_path
67
import os
78
import tempfile
89
import json
@@ -59,6 +60,69 @@ def save_settings(settings):
5960
with open(get_settings_path(), 'w') as f:
6061
json.dump(settings, f, indent=4)
6162

63+
def extract_filename_from_script(script_text, extension, max_length=50):
64+
"""
65+
Extracts a safe filename from the beginning of the first sentence in the script.
66+
67+
Args:
68+
script_text: The script content
69+
extension: File extension (e.g., 'mp3', 'docx')
70+
max_length: Maximum length for the filename (default 50)
71+
72+
Returns:
73+
A sanitized filename with the given extension
74+
"""
75+
# Remove speaker labels and get the first sentence
76+
lines = script_text.strip().split('\n')
77+
first_dialogue = ""
78+
79+
for line in lines:
80+
# Skip empty lines
81+
if not line.strip():
82+
continue
83+
# Check if line has speaker format (Speaker: text)
84+
match = re.match(r'^\s*([^:]+?)\s*:\s*(.+)$', line)
85+
if match:
86+
first_dialogue = match.group(2).strip()
87+
break
88+
else:
89+
# If no speaker format, use the line as-is
90+
first_dialogue = line.strip()
91+
break
92+
93+
if not first_dialogue:
94+
# Fallback to UUID if no content found
95+
return f"podcast_{os.urandom(4).hex()}.{extension}"
96+
97+
# Remove any bracketed annotations like [playful], [laughing], etc.
98+
first_dialogue = re.sub(r'\[.*?\]', '', first_dialogue).strip()
99+
100+
# Extract the beginning (up to first sentence or max_length)
101+
# Split by sentence-ending punctuation
102+
sentence_match = re.match(r'^([^.!?]+)', first_dialogue)
103+
if sentence_match:
104+
first_sentence = sentence_match.group(1).strip()
105+
else:
106+
first_sentence = first_dialogue
107+
108+
# Limit length
109+
if len(first_sentence) > max_length:
110+
first_sentence = first_sentence[:max_length].strip()
111+
112+
# Remove or replace characters that are unsafe for filenames
113+
# Keep alphanumeric, spaces, hyphens, and underscores
114+
safe_name = re.sub(r'[^\w\s\-]', '', first_sentence)
115+
# Replace multiple spaces/hyphens with single underscore
116+
safe_name = re.sub(r'[\s\-]+', '_', safe_name)
117+
# Remove leading/trailing underscores
118+
safe_name = safe_name.strip('_')
119+
120+
# If we ended up with an empty name, use fallback
121+
if not safe_name:
122+
return f"podcast_{os.urandom(4).hex()}.{extension}"
123+
124+
return f"{safe_name}.{extension}"
125+
62126
# --- Routes ---
63127
@app.route('/')
64128
def index():
@@ -112,6 +176,7 @@ def get_settings():
112176
settings = load_settings()
113177
settings['has_elevenlabs_key'] = bool(os.environ.get("ELEVENLABS_API_KEY"))
114178
settings['has_gemini_key'] = bool(os.environ.get("GEMINI_API_KEY"))
179+
settings['has_analysis_prompt'] = bool(get_analysis_prompt_path())
115180
return jsonify(settings)
116181

117182
@app.route('/api/settings', methods=['POST'])
@@ -254,7 +319,7 @@ def handle_generate():
254319
app_settings_clean = sanitize_app_settings_for_backend(app_settings)
255320

256321
task_id = str(uuid.uuid4())
257-
output_filename = f"{task_id}.mp3"
322+
output_filename = extract_filename_from_script(sanitized_script, 'mp3')
258323
output_filepath = os.path.join(app.config['TEMP_DIR'], output_filename)
259324

260325
stop_event = threading.Event()
@@ -380,6 +445,43 @@ def download_demo_zip(demo_id):
380445
def get_temp_file(filename):
381446
return send_from_directory(app.config['TEMP_DIR'], filename)
382447

448+
@app.route('/api/generate_analysis', methods=['POST'])
449+
def handle_generate_analysis():
450+
"""Generates a DOCX analysis document from a transcript using Gemini API."""
451+
data = request.json
452+
transcript = data.get('transcript', '')
453+
454+
if not transcript:
455+
return jsonify({'error': 'Transcript is required.'}), 400
456+
457+
# Check if Gemini API key is available
458+
api_key = os.environ.get("GEMINI_API_KEY")
459+
if not api_key:
460+
return jsonify({'error': 'Gemini API key not configured.'}), 403
461+
462+
try:
463+
# Generate the analysis DOCX
464+
docx_filename = extract_filename_from_script(transcript, 'docx')
465+
docx_path = os.path.join(app.config['TEMP_DIR'], docx_filename)
466+
467+
generate_analysis_docx(
468+
transcript=transcript,
469+
output_path=docx_path,
470+
api_key=api_key
471+
)
472+
473+
return jsonify({
474+
'download_url': f'/temp/{docx_filename}',
475+
'filename': docx_filename
476+
})
477+
478+
except ValueError as e:
479+
logger.error(f"Validation error during analysis generation: {e}")
480+
return jsonify({'error': str(e)}), 400
481+
except Exception as e:
482+
logger.error(f"Error during analysis generation: {e}", exc_info=True)
483+
return jsonify({'error': 'An unexpected error occurred during analysis generation.'}), 500
484+
383485
if __name__ == '__main__':
384486
from dotenv import load_dotenv
385487
load_dotenv()

docs/README-fr.md

Lines changed: 55 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,11 +24,12 @@ Commencez à créer vos podcasts en quelques minutes.
2424

2525
- **Interface Moderne** : Une interface claire, moderne et réactive construite avec `customtkinter` qui s'adapte au mode clair ou sombre de votre système.
2626
- **Double Fournisseur TTS** : Choisissez entre les voix de haute qualité de **Google Gemini** ou **ElevenLabs**.
27+
- **Analyse IA des Scripts** : Générez des documents DOCX avec une analyse IA de vos scripts de podcast, incluant des résumés, des questions de compréhension pour différents niveaux de langue (A1, A2, B1), et des informations pédagogiques clés. Parfait pour les enseignants de langues et les créateurs de contenu.
2728
- **Démo HTML Synchronisée** : Générez automatiquement une page HTML partageable avec l'audio de votre podcast et une transcription synchronisée et surlignée.
2829
- **Formats flexibles** : Export en **MP3** (par défaut) ou **WAV**.
2930
- **Personnalisation** : Sauvegarde des voix et paramètres pour chaque locuteur.
3031
- **Guides vocaux** : Explorez et écoutez toutes les voix disponibles de Gemini et ElevenLabs directement depuis les réglages. Ajoutez vos voix préférées à votre liste de locuteurs en un seul clic.
31-
- **Lecture intégrée** : Écoutez et arrêtez vos créations directement depuis lapplication (**FFmpeg requis**).
32+
- **Lecture intégrée** : Écoutez et arrêtez vos créations directement depuis l'application (**FFmpeg requis**).
3233
- **Stockage sécurisé de la clé API** : Votre clé API Google Gemini est demandée une seule fois et enregistrée de manière sécurisée dans le trousseau du système (`keyring`).
3334
- **Support des accents et langues** : Créez des podcasts en plusieurs langues avec des voix et des accents distincts pour chaque langue (depuis les réglages des locuteurs avec l'API ElevenLabs ou depuis le prompt avec Gemini).
3435
- **Support Docker** : Exécutez l'application en tant que service web à l'aide de Docker. Cela simplifie le déploiement, ne nécessite aucune installation supplémentaire et peut fonctionner sur un petit serveur ou localement.
@@ -84,6 +85,59 @@ John (es): Hola a todos, bienvenidos a este nuevo episodio.
8485

8586
---
8687

88+
## 📝 Analyse IA des Scripts (Interface Web)
89+
90+
L'interface web inclut une fonctionnalité d'analyse IA optionnelle qui génère des documents DOCX professionnels analysant vos scripts de podcast. Cette fonctionnalité est particulièrement utile pour les **enseignants de langues**, les **créateurs de contenu** et les **développeurs de matériel pédagogique**.
91+
92+
### Contenu de l'Analyse
93+
94+
Le document DOCX généré contient :
95+
- **Résumé** : Un aperçu concis du contenu du podcast
96+
- **Personnages Principaux** : Les intervenants et personnalités clés mentionnés
97+
- **Lieux Importants** : Les endroits importants référencés dans le script
98+
- **Thème Central** : Le message ou sujet principal
99+
- **Questions de Compréhension** : Questions adaptées à différents niveaux de compétence linguistique :
100+
- A1 (Débutant)
101+
- A1+/A2 (Élémentaire)
102+
- A2+/B1 (Intermédiaire)
103+
104+
### Instructions de Configuration
105+
106+
Pour activer cette fonctionnalité dans l'interface web :
107+
108+
1. **Configurer la Clé API Gemini**
109+
Ajoutez votre clé API Gemini au fichier `.env` :
110+
```bash
111+
GEMINI_API_KEY=votre_clé_ici
112+
```
113+
114+
2. **Créer le Fichier de Prompt d'Analyse**
115+
Copiez l'exemple de configuration du prompt :
116+
```bash
117+
cp config/analysis_prompt.txt.example config/analysis_prompt.txt
118+
```
119+
120+
3. **Personnaliser le Prompt (Optionnel)**
121+
Éditez `config/analysis_prompt.txt` pour modifier la façon dont l'IA analyse vos scripts. Vous pouvez ajuster :
122+
- Les types de questions générées
123+
- Les niveaux de langue ciblés
124+
- La profondeur et les domaines d'analyse
125+
- Les préférences de formatage de sortie
126+
127+
4. **Accéder à la Fonctionnalité**
128+
Une fois configuré, un bouton violet "Generate DOCX Analysis" apparaîtra à côté du bouton "Generate Podcast" dans l'interface web.
129+
130+
### Emplacements des Fichiers
131+
132+
- **Docker** : `./config/analysis_prompt.txt`
133+
- **macOS** : `~/Library/Application Support/PodcastGenerator/analysis_prompt.txt`
134+
- **Windows** : `%APPDATA%/PodcastGenerator/analysis_prompt.txt`
135+
- **Linux** : `~/.config/PodcastGenerator/analysis_prompt.txt`
136+
137+
Pour plus de détails, consultez le fichier `config/README.md`.
138+
139+
---
140+
87141
## 📦 Installation
88142

89143
### 1. Dépendance externe : FFmpeg (obligatoire)

generate_podcast.py

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -158,7 +158,12 @@ def synthesize(self, script_text: str, speaker_mapping: dict, output_filepath: s
158158
gemini_script = script_text.replace('[', '(').replace(']', ')')
159159
logger.info("Converted script annotations from [] to () for Gemini.")
160160

161-
models_to_try = ["gemini-2.5-pro-preview-tts", "gemini-2.5-flash-preview-tts"]
161+
# Get model from environment variable or use defaults
162+
primary_model = os.environ.get("GEMINI_TTS_MODEL", "gemini-2.5-flash-preview-tts")
163+
models_to_try = [primary_model, "gemini-2.5-pro-preview-tts", "gemini-2.5-flash-preview-tts"]
164+
# Remove duplicates while preserving order
165+
models_to_try = list(dict.fromkeys(models_to_try))
166+
162167
contents = [types.Content(role="user", parts=[types.Part.from_text(text=gemini_script)])]
163168

164169
num_speakers = len(speaker_mapping)

requirements.txt

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,4 +11,5 @@ elevenlabs
1111

1212
# Utility
1313
requests
14-
keyring
14+
keyring
15+
python-docx

0 commit comments

Comments
 (0)