ShortsGenie 🎬⚽

AI-powered sports video auto-editing system that transforms horizontal (16:9) soccer footage into vertical (9:16) short-form content optimized for social media platforms (TikTok, Reels, Shorts).

Features

Hybrid Highlight Detection: Combines Whisper + Gemini AI analysis for intelligent highlight extraction
Dynamic Reframing: YOLOv8-based ball and player tracking with multiple detection backends
Scene-Aware Processing: ResNet18 scene classification for intelligent ROI calculation
Smooth Camera Movement: Kalman filter, EMA, and adaptive EMA smoothing options
Temporal Ball Filtering: Savitzky-Golay filter for robust ball trajectory smoothing
Parallel Processing: CPU-based parallel clip generation for faster processing
Audio Preservation: Automatic audio extraction and merging
PySide6 GUI: Full-featured interface with Korean language support

Quick Start

1. Clone and Setup

git clone <repository-url>
cd shortsgenie

# Create and activate virtual environment
python3 -m venv venv
source venv/bin/activate  # macOS/Linux
# or: venv\Scripts\activate  # Windows

# Install dependencies
pip install -r requirements.txt

2. Download Models

Models are not included in repository due to file size. Download them to resources/models/:

Required Models:

best.pt - Fine-tuned soccer detection model (default, ~6MB)
yolov8n.pt - YOLOv8 nano (~6MB) [Auto-download]
scene_classifier/soccer_model_ver2.pth - ResNet18 scene classifier (~44MB)

Auto-download (YOLO models):

python -c "from ultralytics import YOLO; YOLO('yolov8n.pt')"

3. Run the Application

# GUI Mode (recommended)
python main.py

Project Structure

shortsgenie/
├── main.py                      # GUI entry point
├── src/
│   ├── core/                   # Detection, ROI, smoothing, cropping
│   │   ├── detector.py          # Generic YOLO detector
│   │   ├── soccernet_detector.py # SoccerNet fine-tuned detector
│   │   ├── temporal_filter.py   # Savitzky-Golay ball filtering
│   │   ├── roi_calculator.py   # ROI calculation with hysteresis
│   │   ├── smoother.py          # Kalman/EMA smoothing
│   │   └── cropper.py          # Video cropping with audio
│   ├── pipeline/               # Processing pipelines
│   │   ├── reframing_pipeline.py    # Core reframing (PHASE 2)
│   │   ├── highlight_pipeline.py     # Full highlight generation
│   │   └── pipeline_config.py       # Configuration system
│   ├── audio/                  # Audio analysis and transcription
│   │   ├── whisper_transcriber.py     # Local Whisper STT
│   │   ├── groq_transcriber.py        # Groq API transcription
│   │   ├── highlight_filter.py         # Audio highlight detection
│   │   └── scoreboard_ocr_detector.py   # PaddleOCR goal detection
│   ├── ai/                     # AI integration
│   │   └── transcript_analyzer.py  # Gemini AI analysis
│   ├── scene/                  # Scene processing
│   │   ├── scene_classifier.py    # ResNet18 scene classification
│   │   └── scene_manager.py      # Scene management
│   ├── gui/                    # PySide6 interface
│   │   ├── main_window.py       # Main window
│   │   ├── progress_page.py    # Processing progress
│   │   ├── highlight_selector.py # Highlight selection
│   │   ├── preview_page.py     # Video preview
│   │   └── output_page.py      # Output settings
│   ├── utils/                  # Utilities
│   │   ├── config.py           # Centralized configuration
│   │   ├── video_utils.py      # Video I/O
│   │   └── quality_presets.py  # Quality presets
│   └── models/                 # Data models
│       └── detection_result.py  # Detection structures
├── resources/models/            # ⚠️ Model files (NOT in git)
├── input/                      # Place input videos here
├── output/                     # Processed videos save here
├── test_*.py                  # Test scripts
├── requirements.txt             # Python dependencies
└── README.md                   # This file

Detection Backends

The system supports multiple detection backends for ball and player detection:

1. Generic YOLO (`detector_backend="yolo"`)

Model: YOLOv8 (n/s/m variants)
Training: COCO dataset (universal object detection)
Use case: General purpose, non-soccer content

2. SoccerNet YOLO (`detector_backend="soccernet"`) [Default]

Model: YOLOv8 fine-tuned on SoccerNet dataset
Training: Soccer-specific footage
Use case: Soccer footage with standard camera angles

Configuration

Environment Variables (Optional)

Copy .env.example to .env and configure if needed:

cp .env.example .env

Available variables:

GOOGLE_API_KEY - For Gemini AI integration (recommended for transcript analysis)
GROQ_API_KEY - For Groq API (alternative to local Whisper)
FFMPEG_PATH - Custom ffmpeg path (optional)

Python Configuration

All settings in src/utils/config.py and src/pipeline/pipeline_config.py.

Example usage:

from src.utils.config import AppConfig
from src.pipeline.reframing_pipeline import ReframingPipeline

config = AppConfig()
config.detection.detector_backend = "soccernet"
config.detection.confidence_threshold = 0.05

pipeline = ReframingPipeline(config)
stats = pipeline.process_goal_clip(
    clip_path="input/goal_clip.mp4",
    output_path="output/goal_clip_vertical.mp4"
)

Processing Pipeline

The hybrid pipeline consists of following modules:

1. Audio Analysis (Optional)

Whisper STT (local) or Groq API (cloud) for speech-to-text
Audio excitement detection for highlight segments

2. Transcript Analysis (Enabled by default)

Gemini AI analysis of transcripts for intelligent highlight extraction
Context-aware highlight generation with descriptions

3. Scoreboard OCR (Optional)

PaddleOCR-based goal detection from scoreboard
Audio boost mode for high accuracy during exciting moments

4. Scene Classification (Per-clip)

ResNet18-based scene type classification
Detects: wide, close, audience, replay scenes

5. Dynamic Reframing

Ball and player detection with YOLO/SoccerNet
Temporal filtering (Savitzky-Golay)
ROI calculation with hysteresis and scene locking
Smoothing (Kalman/EMA/Adaptive EMA)

6. Video Generation

Parallel clip generation (3+ clips)
Video cropping and encoding
Audio preservation and merging

Testing

# Test SoccerNet reframing pipeline
python test_soccernet_pipeline.py input/test_clip.mp4

# Test scene-aware pipeline
python test_scene_aware_pipeline.py

# Test parallel processing performance
python test_parallel_pipeline.py input/test_video.mp4

# Test OCR scoreboard detection
python test_ocr.py

Requirements

Python 3.8+
FFmpeg (system installation required)
8GB+ RAM recommended
GPU optional but recommended (CUDA or Apple Silicon MPS)

Dependencies

Key dependencies include:

PyTorch: Deep learning framework
Ultralytics YOLO: Object detection
OpenCV: Computer vision
Whisper: Speech-to-text
PaddleOCR: OCR for goal detection
PySide6: GUI framework
Librosa: Audio analysis
scenedetect: Scene boundary detection

Performance

Processing Speed: ~15-30 FPS on modern CPU/GPU
Parallel Speedup: 2-4x faster for 3+ clips
Memory Usage: ~2-4GB per GPU worker
Ball Detection: 30-50%+ detection rate with temporal filtering

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ShortsGenie 🎬⚽

Features

Quick Start

1. Clone and Setup

2. Download Models

3. Run the Application

Project Structure

Detection Backends

1. Generic YOLO (`detector_backend="yolo"`)

2. SoccerNet YOLO (`detector_backend="soccernet"`) [Default]

Configuration

Environment Variables (Optional)

Python Configuration

Processing Pipeline

1. Audio Analysis (Optional)

2. Transcript Analysis (Enabled by default)

3. Scoreboard OCR (Optional)

4. Scene Classification (Per-clip)

5. Dynamic Reframing

6. Video Generation

Testing

Requirements

Dependencies

Performance

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

ShortsGenie 🎬⚽

Features

Quick Start

1. Clone and Setup

2. Download Models

3. Run the Application

Project Structure

Detection Backends

1. Generic YOLO (detector_backend="yolo")

2. SoccerNet YOLO (detector_backend="soccernet") [Default]

Configuration

Environment Variables (Optional)

Python Configuration

Processing Pipeline

1. Audio Analysis (Optional)

2. Transcript Analysis (Enabled by default)

3. Scoreboard OCR (Optional)

4. Scene Classification (Per-clip)

5. Dynamic Reframing

6. Video Generation

Testing

Requirements

Dependencies

Performance

1. Generic YOLO (`detector_backend="yolo"`)

2. SoccerNet YOLO (`detector_backend="soccernet"`) [Default]