Adds AI-powered script analysis and DOCX document generation.

laurentftech · laurentftech · commit 504d85650497 · 2025-12-09T23:12:27.000+01:00
diff --git a/.env.example b/.env.example
@@ -9,3 +9,24 @@ GEMINI_API_KEY=your_gemini_api_key_here
 
 # Default TTS provider elevenlabs or gemini
 DEFAULT_TTS_PROVIDER="elevenlabs"
+
+# Gemini model for TTS generation (optional, defaults to gemini-2.5-flash-preview-tts)
+GEMINI_TTS_MODEL=gemini-2.5-flash-preview-tts
+
+# Gemini model for transcript analysis (optional, defaults to gemini-2.5-flash)
+GEMINI_ANALYSIS_MODEL=gemini-2.5-flash
+
+# Analysis Configuration
+# ---------------------
+# To enable the "Generate DOCX Analysis" button in the web interface:
+# 1. Ensure GEMINI_API_KEY is configured above
+# 2. Create a file named 'analysis_prompt.txt' in the './config' directory
+#    (for Docker) or in your app data directory (for desktop app)
+# 3. The prompt file should contain the instructions for the Gemini AI
+#    to analyze podcast transcripts
+#
+# Example location:
+#   - Docker/Local: ./config/analysis_prompt.txt
+#   - macOS: ~/Library/Application Support/PodcastGenerator/analysis_prompt.txt
+#   - Windows: %APPDATA%/PodcastGenerator/analysis_prompt.txt
+#   - Linux: ~/.config/PodcastGenerator/analysis_prompt.txt
diff --git a/.gitignore b/.gitignore
@@ -25,3 +25,9 @@ instance/*
 
 # Environment variables (contains API keys)
 .env
+
+# Config directory (user customization)
+config/*
+!config/.gitkeep
+!config/*.example
+!config/README.md
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -4,6 +4,27 @@ All notable changes to this project will be documented in this file.
 
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).
 
+## [2.0.0b27]
+
+### Added
+- **AI-Powered Script Analysis**: New web interface feature for generating DOCX documents with AI analysis
+  - Generates comprehensive analysis including summaries, character lists, locations, and themes
+  - Creates comprehension questions for multiple language levels (A1, A1+/A2, A2+/B1)
+  - Perfect for language teachers and educational content creators
+  - Configurable via `config/analysis_prompt.txt` file
+  - Button appears automatically when both Gemini API key and prompt file are configured
+  - Files are named based on script content instead of random UUIDs
+- **Configuration Directory**: New `config/` directory for user-customizable settings
+  - Analysis prompt templates stored in `config/analysis_prompt.txt`
+  - Example file provided: `config/analysis_prompt.txt.example`
+  - Priority order: config dir → app data dir → bundled assets
+  - Fully compatible with Docker volume mounts
+- **Dependencies**: Added `python-docx` for DOCX document generation
+
+### Changed
+- **Filename Generation**: Both podcast MP3 and analysis DOCX files now use the beginning of the first sentence as filename instead of random UUIDs
+- **Environment Variables**: Added `GEMINI_ANALYSIS_MODEL` for configuring the Gemini model used for analysis (defaults to `gemini-2.5-flash`)
+
 ## [2.0.0b26]
 
 ### Changed
diff --git a/README.md b/README.md
@@ -28,6 +28,7 @@ See also the [French README](docs/README-fr.md) for a version in French.
 
  - **Modern UI**: A clean, modern, and responsive interface built with `customtkinter` that adapts to your system's light or dark mode.
  - **Dual TTS Provider**: Choose between the high-quality voices of **Google Gemini** or **ElevenLabs**.
+ - **AI-Powered Script Analysis**: Generate DOCX documents with AI-powered analysis of your podcast scripts, including summaries, comprehension questions for different language levels (A1, A2, B1), and key educational insights. Perfect for language teachers and content creators.
  - **Synchronized HTML Demo**: Automatically generate a shareable HTML page with your podcast audio and a synchronized, highlighted transcript.
  - **Flexible Formats**: Export your creations in **MP3** (default) or **WAV** formats.
  - **Customization**: Configure and save voices for each speaker in your scripts, with options for language and accent.
@@ -98,6 +99,61 @@ D’écritoire, monsieur, ou de boîte à ciseaux ? »
 
 💡 Note on Annotations: The app uses square brackets [emotion] for ElevenLabs' emotional cues. If you use Gemini, the app will automatically convert them to parentheses (emotion) for you.
 
+---
+
+## 📝 AI-Powered Script Analysis (Web Interface)
+
+The web interface includes an optional AI-powered analysis feature that generates professional DOCX documents analyzing your podcast scripts. This feature is particularly useful for **language teachers**, **content creators**, and **educational material developers**.
+
+### What's Included in the Analysis
+
+The generated DOCX document contains:
+- **Summary**: A concise overview of the podcast content
+- **Main Characters**: Key speakers and personalities mentioned
+- **Key Locations**: Important places referenced in the script
+- **Central Theme**: The main message or topic
+- **Comprehension Questions**: Tailored questions for different language proficiency levels:
+  - A1 (Beginner)
+  - A1+/A2 (Elementary)
+  - A2+/B1 (Intermediate)
+
+### Setup Instructions
+
+To enable this feature in the web interface:
+
+1. **Configure Gemini API Key**
+   Add your Gemini API key to the `.env` file:
+   ```bash
+   GEMINI_API_KEY=your_actual_key_here
+   ```
+
+2. **Create Analysis Prompt File**
+   Copy the example prompt configuration:
+   ```bash
+   cp config/analysis_prompt.txt.example config/analysis_prompt.txt
+   ```
+
+3. **Customize the Prompt (Optional)**
+   Edit `config/analysis_prompt.txt` to modify how the AI analyzes your scripts. You can adjust:
+   - The types of questions generated
+   - Language levels targeted
+   - Analysis depth and focus areas
+   - Output formatting preferences
+
+4. **Access the Feature**
+   Once configured, a purple "Generate DOCX Analysis" button will appear next to the "Generate Podcast" button in the web interface.
+
+### File Locations
+
+- **Docker**: `./config/analysis_prompt.txt`
+- **macOS**: `~/Library/Application Support/PodcastGenerator/analysis_prompt.txt`
+- **Windows**: `%APPDATA%/PodcastGenerator/analysis_prompt.txt`
+- **Linux**: `~/.config/PodcastGenerator/analysis_prompt.txt`
+
+For more details, see the `config/README.md` file.
+
+---
+
 ## 📦 Installation
 
 ### 1. Required Dependency: FFmpeg
diff --git a/app.py b/app.py
@@ -3,6 +3,7 @@
 from utils import sanitize_text, get_asset_path, get_app_data_dir
 from config import AVAILABLE_VOICES, DEFAULT_APP_SETTINGS, DEMO_AVAILABLE
 from create_demo import create_html_demo_whisperx
+from transcript_analyzer import generate_analysis_docx, get_analysis_prompt_path
 import os
 import tempfile
 import json
@@ -59,6 +60,69 @@ def save_settings(settings):
     with open(get_settings_path(), 'w') as f:
         json.dump(settings, f, indent=4)
 
+def extract_filename_from_script(script_text, extension, max_length=50):
+    """
+    Extracts a safe filename from the beginning of the first sentence in the script.
+
+    Args:
+        script_text: The script content
+        extension: File extension (e.g., 'mp3', 'docx')
+        max_length: Maximum length for the filename (default 50)
+
+    Returns:
+        A sanitized filename with the given extension
+    """
+    # Remove speaker labels and get the first sentence
+    lines = script_text.strip().split('\n')
+    first_dialogue = ""
+
+    for line in lines:
+        # Skip empty lines
+        if not line.strip():
+            continue
+        # Check if line has speaker format (Speaker: text)
+        match = re.match(r'^\s*([^:]+?)\s*:\s*(.+)$', line)
+        if match:
+            first_dialogue = match.group(2).strip()
+            break
+        else:
+            # If no speaker format, use the line as-is
+            first_dialogue = line.strip()
+            break
+
+    if not first_dialogue:
+        # Fallback to UUID if no content found
+        return f"podcast_{os.urandom(4).hex()}.{extension}"
+
+    # Remove any bracketed annotations like [playful], [laughing], etc.
+    first_dialogue = re.sub(r'\[.*?\]', '', first_dialogue).strip()
+
+    # Extract the beginning (up to first sentence or max_length)
+    # Split by sentence-ending punctuation
+    sentence_match = re.match(r'^([^.!?]+)', first_dialogue)
+    if sentence_match:
+        first_sentence = sentence_match.group(1).strip()
+    else:
+        first_sentence = first_dialogue
+
+    # Limit length
+    if len(first_sentence) > max_length:
+        first_sentence = first_sentence[:max_length].strip()
+
+    # Remove or replace characters that are unsafe for filenames
+    # Keep alphanumeric, spaces, hyphens, and underscores
+    safe_name = re.sub(r'[^\w\s\-]', '', first_sentence)
+    # Replace multiple spaces/hyphens with single underscore
+    safe_name = re.sub(r'[\s\-]+', '_', safe_name)
+    # Remove leading/trailing underscores
+    safe_name = safe_name.strip('_')
+
+    # If we ended up with an empty name, use fallback
+    if not safe_name:
+        return f"podcast_{os.urandom(4).hex()}.{extension}"
+
+    return f"{safe_name}.{extension}"
+
 # --- Routes ---
 @app.route('/')
 def index():
@@ -112,6 +176,7 @@ def get_settings():
     settings = load_settings()
     settings['has_elevenlabs_key'] = bool(os.environ.get("ELEVENLABS_API_KEY"))
     settings['has_gemini_key'] = bool(os.environ.get("GEMINI_API_KEY"))
+    settings['has_analysis_prompt'] = bool(get_analysis_prompt_path())
     return jsonify(settings)
 
 @app.route('/api/settings', methods=['POST'])
@@ -254,7 +319,7 @@ def handle_generate():
     app_settings_clean = sanitize_app_settings_for_backend(app_settings)
 
     task_id = str(uuid.uuid4())
-    output_filename = f"{task_id}.mp3"
+    output_filename = extract_filename_from_script(sanitized_script, 'mp3')
     output_filepath = os.path.join(app.config['TEMP_DIR'], output_filename)
     
     stop_event = threading.Event()
@@ -380,6 +445,43 @@ def download_demo_zip(demo_id):
 def get_temp_file(filename):
     return send_from_directory(app.config['TEMP_DIR'], filename)
 
+@app.route('/api/generate_analysis', methods=['POST'])
+def handle_generate_analysis():
+    """Generates a DOCX analysis document from a transcript using Gemini API."""
+    data = request.json
+    transcript = data.get('transcript', '')
+
+    if not transcript:
+        return jsonify({'error': 'Transcript is required.'}), 400
+
+    # Check if Gemini API key is available
+    api_key = os.environ.get("GEMINI_API_KEY")
+    if not api_key:
+        return jsonify({'error': 'Gemini API key not configured.'}), 403
+
+    try:
+        # Generate the analysis DOCX
+        docx_filename = extract_filename_from_script(transcript, 'docx')
+        docx_path = os.path.join(app.config['TEMP_DIR'], docx_filename)
+
+        generate_analysis_docx(
+            transcript=transcript,
+            output_path=docx_path,
+            api_key=api_key
+        )
+
+        return jsonify({
+            'download_url': f'/temp/{docx_filename}',
+            'filename': docx_filename
+        })
+
+    except ValueError as e:
+        logger.error(f"Validation error during analysis generation: {e}")
+        return jsonify({'error': str(e)}), 400
+    except Exception as e:
+        logger.error(f"Error during analysis generation: {e}", exc_info=True)
+        return jsonify({'error': 'An unexpected error occurred during analysis generation.'}), 500
+
 if __name__ == '__main__':
     from dotenv import load_dotenv
     load_dotenv()
diff --git a/docs/README-fr.md b/docs/README-fr.md
@@ -24,11 +24,12 @@ Commencez à créer vos podcasts en quelques minutes.
 
 - **Interface Moderne** : Une interface claire, moderne et réactive construite avec `customtkinter` qui s'adapte au mode clair ou sombre de votre système.
 - **Double Fournisseur TTS** : Choisissez entre les voix de haute qualité de **Google Gemini** ou **ElevenLabs**.
+- **Analyse IA des Scripts** : Générez des documents DOCX avec une analyse IA de vos scripts de podcast, incluant des résumés, des questions de compréhension pour différents niveaux de langue (A1, A2, B1), et des informations pédagogiques clés. Parfait pour les enseignants de langues et les créateurs de contenu.
 - **Démo HTML Synchronisée** : Générez automatiquement une page HTML partageable avec l'audio de votre podcast et une transcription synchronisée et surlignée.
 - **Formats flexibles** : Export en **MP3** (par défaut) ou **WAV**.
 - **Personnalisation** : Sauvegarde des voix et paramètres pour chaque locuteur.
 - **Guides vocaux** : Explorez et écoutez toutes les voix disponibles de Gemini et ElevenLabs directement depuis les réglages. Ajoutez vos voix préférées à votre liste de locuteurs en un seul clic.
-- **Lecture intégrée** : Écoutez et arrêtez vos créations directement depuis l’application (**FFmpeg requis**).
+- **Lecture intégrée** : Écoutez et arrêtez vos créations directement depuis l'application (**FFmpeg requis**).
 - **Stockage sécurisé de la clé API** : Votre clé API Google Gemini est demandée une seule fois et enregistrée de manière sécurisée dans le trousseau du système (`keyring`).
 - **Support des accents et langues** : Créez des podcasts en plusieurs langues avec des voix et des accents distincts pour chaque langue (depuis les réglages des locuteurs avec l'API ElevenLabs ou depuis le prompt avec Gemini).
 - **Support Docker** : Exécutez l'application en tant que service web à l'aide de Docker. Cela simplifie le déploiement, ne nécessite aucune installation supplémentaire et peut fonctionner sur un petit serveur ou localement.
@@ -84,6 +85,59 @@ John (es): Hola a todos, bienvenidos a este nuevo episodio.
 
 ---
 
+## 📝 Analyse IA des Scripts (Interface Web)
+
+L'interface web inclut une fonctionnalité d'analyse IA optionnelle qui génère des documents DOCX professionnels analysant vos scripts de podcast. Cette fonctionnalité est particulièrement utile pour les **enseignants de langues**, les **créateurs de contenu** et les **développeurs de matériel pédagogique**.
+
+### Contenu de l'Analyse
+
+Le document DOCX généré contient :
+- **Résumé** : Un aperçu concis du contenu du podcast
+- **Personnages Principaux** : Les intervenants et personnalités clés mentionnés
+- **Lieux Importants** : Les endroits importants référencés dans le script
+- **Thème Central** : Le message ou sujet principal
+- **Questions de Compréhension** : Questions adaptées à différents niveaux de compétence linguistique :
+  - A1 (Débutant)
+  - A1+/A2 (Élémentaire)
+  - A2+/B1 (Intermédiaire)
+
+### Instructions de Configuration
+
+Pour activer cette fonctionnalité dans l'interface web :
+
+1. **Configurer la Clé API Gemini**
+   Ajoutez votre clé API Gemini au fichier `.env` :
+   ```bash
+   GEMINI_API_KEY=votre_clé_ici
+   ```
+
+2. **Créer le Fichier de Prompt d'Analyse**
+   Copiez l'exemple de configuration du prompt :
+   ```bash
+   cp config/analysis_prompt.txt.example config/analysis_prompt.txt
+   ```
+
+3. **Personnaliser le Prompt (Optionnel)**
+   Éditez `config/analysis_prompt.txt` pour modifier la façon dont l'IA analyse vos scripts. Vous pouvez ajuster :
+   - Les types de questions générées
+   - Les niveaux de langue ciblés
+   - La profondeur et les domaines d'analyse
+   - Les préférences de formatage de sortie
+
+4. **Accéder à la Fonctionnalité**
+   Une fois configuré, un bouton violet "Generate DOCX Analysis" apparaîtra à côté du bouton "Generate Podcast" dans l'interface web.
+
+### Emplacements des Fichiers
+
+- **Docker** : `./config/analysis_prompt.txt`
+- **macOS** : `~/Library/Application Support/PodcastGenerator/analysis_prompt.txt`
+- **Windows** : `%APPDATA%/PodcastGenerator/analysis_prompt.txt`
+- **Linux** : `~/.config/PodcastGenerator/analysis_prompt.txt`
+
+Pour plus de détails, consultez le fichier `config/README.md`.
+
+---
+
 ## 📦 Installation
 
 ### 1. Dépendance externe : FFmpeg (obligatoire)
diff --git a/generate_podcast.py b/generate_podcast.py
@@ -158,7 +158,12 @@ def synthesize(self, script_text: str, speaker_mapping: dict, output_filepath: s
         gemini_script = script_text.replace('[', '(').replace(']', ')')
         logger.info("Converted script annotations from [] to () for Gemini.")
 
-        models_to_try = ["gemini-2.5-pro-preview-tts", "gemini-2.5-flash-preview-tts"]
+        # Get model from environment variable or use defaults
+        primary_model = os.environ.get("GEMINI_TTS_MODEL", "gemini-2.5-flash-preview-tts")
+        models_to_try = [primary_model, "gemini-2.5-pro-preview-tts", "gemini-2.5-flash-preview-tts"]
+        # Remove duplicates while preserving order
+        models_to_try = list(dict.fromkeys(models_to_try))
+
         contents = [types.Content(role="user", parts=[types.Part.from_text(text=gemini_script)])]
 
         num_speakers = len(speaker_mapping)
diff --git a/requirements.txt b/requirements.txt
@@ -11,4 +11,5 @@ elevenlabs
 
 # Utility
 requests
-keyring
+keyring
+python-docx
diff --git a/templates/index.html b/templates/index.html