Skip to content

KJCapstone/ShortsGenie

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

121 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

ShortsGenie 🎬⚽

AI-powered sports video auto-editing system that transforms horizontal (16:9) soccer footage into vertical (9:16) short-form content optimized for social media platforms (TikTok, Reels, Shorts).

Features

  • Hybrid Highlight Detection: Combines Whisper + Gemini AI analysis for intelligent highlight extraction
  • Dynamic Reframing: YOLOv8-based ball and player tracking with multiple detection backends
  • Scene-Aware Processing: ResNet18 scene classification for intelligent ROI calculation
  • Smooth Camera Movement: Kalman filter, EMA, and adaptive EMA smoothing options
  • Temporal Ball Filtering: Savitzky-Golay filter for robust ball trajectory smoothing
  • Parallel Processing: CPU-based parallel clip generation for faster processing
  • Audio Preservation: Automatic audio extraction and merging
  • PySide6 GUI: Full-featured interface with Korean language support

Quick Start

1. Clone and Setup

git clone <repository-url>
cd shortsgenie

# Create and activate virtual environment
python3 -m venv venv
source venv/bin/activate  # macOS/Linux
# or: venv\Scripts\activate  # Windows

# Install dependencies
pip install -r requirements.txt

2. Download Models

Models are not included in repository due to file size. Download them to resources/models/:

Required Models:

  • best.pt - Fine-tuned soccer detection model (default, ~6MB)
  • yolov8n.pt - YOLOv8 nano (~6MB) [Auto-download]
  • scene_classifier/soccer_model_ver2.pth - ResNet18 scene classifier (~44MB)

Auto-download (YOLO models):

python -c "from ultralytics import YOLO; YOLO('yolov8n.pt')"

3. Run the Application

# GUI Mode (recommended)
python main.py

Project Structure

shortsgenie/
β”œβ”€β”€ main.py                      # GUI entry point
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ core/                   # Detection, ROI, smoothing, cropping
β”‚   β”‚   β”œβ”€β”€ detector.py          # Generic YOLO detector
β”‚   β”‚   β”œβ”€β”€ soccernet_detector.py # SoccerNet fine-tuned detector
β”‚   β”‚   β”œβ”€β”€ temporal_filter.py   # Savitzky-Golay ball filtering
β”‚   β”‚   β”œβ”€β”€ roi_calculator.py   # ROI calculation with hysteresis
β”‚   β”‚   β”œβ”€β”€ smoother.py          # Kalman/EMA smoothing
β”‚   β”‚   └── cropper.py          # Video cropping with audio
β”‚   β”œβ”€β”€ pipeline/               # Processing pipelines
β”‚   β”‚   β”œβ”€β”€ reframing_pipeline.py    # Core reframing (PHASE 2)
β”‚   β”‚   β”œβ”€β”€ highlight_pipeline.py     # Full highlight generation
β”‚   β”‚   └── pipeline_config.py       # Configuration system
β”‚   β”œβ”€β”€ audio/                  # Audio analysis and transcription
β”‚   β”‚   β”œβ”€β”€ whisper_transcriber.py     # Local Whisper STT
β”‚   β”‚   β”œβ”€β”€ groq_transcriber.py        # Groq API transcription
β”‚   β”‚   β”œβ”€β”€ highlight_filter.py         # Audio highlight detection
β”‚   β”‚   └── scoreboard_ocr_detector.py   # PaddleOCR goal detection
β”‚   β”œβ”€β”€ ai/                     # AI integration
β”‚   β”‚   └── transcript_analyzer.py  # Gemini AI analysis
β”‚   β”œβ”€β”€ scene/                  # Scene processing
β”‚   β”‚   β”œβ”€β”€ scene_classifier.py    # ResNet18 scene classification
β”‚   β”‚   └── scene_manager.py      # Scene management
β”‚   β”œβ”€β”€ gui/                    # PySide6 interface
β”‚   β”‚   β”œβ”€β”€ main_window.py       # Main window
β”‚   β”‚   β”œβ”€β”€ progress_page.py    # Processing progress
β”‚   β”‚   β”œβ”€β”€ highlight_selector.py # Highlight selection
β”‚   β”‚   β”œβ”€β”€ preview_page.py     # Video preview
β”‚   β”‚   └── output_page.py      # Output settings
β”‚   β”œβ”€β”€ utils/                  # Utilities
β”‚   β”‚   β”œβ”€β”€ config.py           # Centralized configuration
β”‚   β”‚   β”œβ”€β”€ video_utils.py      # Video I/O
β”‚   β”‚   └── quality_presets.py  # Quality presets
β”‚   └── models/                 # Data models
β”‚       └── detection_result.py  # Detection structures
β”œβ”€β”€ resources/models/            # ⚠️ Model files (NOT in git)
β”œβ”€β”€ input/                      # Place input videos here
β”œβ”€β”€ output/                     # Processed videos save here
β”œβ”€β”€ test_*.py                  # Test scripts
β”œβ”€β”€ requirements.txt             # Python dependencies
└── README.md                   # This file

Detection Backends

The system supports multiple detection backends for ball and player detection:

1. Generic YOLO (detector_backend="yolo")

  • Model: YOLOv8 (n/s/m variants)
  • Training: COCO dataset (universal object detection)
  • Use case: General purpose, non-soccer content

2. SoccerNet YOLO (detector_backend="soccernet") [Default]

  • Model: YOLOv8 fine-tuned on SoccerNet dataset
  • Training: Soccer-specific footage
  • Use case: Soccer footage with standard camera angles

Configuration

Environment Variables (Optional)

Copy .env.example to .env and configure if needed:

cp .env.example .env

Available variables:

  • GOOGLE_API_KEY - For Gemini AI integration (recommended for transcript analysis)
  • GROQ_API_KEY - For Groq API (alternative to local Whisper)
  • FFMPEG_PATH - Custom ffmpeg path (optional)

Python Configuration

All settings in src/utils/config.py and src/pipeline/pipeline_config.py.

Example usage:

from src.utils.config import AppConfig
from src.pipeline.reframing_pipeline import ReframingPipeline

config = AppConfig()
config.detection.detector_backend = "soccernet"
config.detection.confidence_threshold = 0.05

pipeline = ReframingPipeline(config)
stats = pipeline.process_goal_clip(
    clip_path="input/goal_clip.mp4",
    output_path="output/goal_clip_vertical.mp4"
)

Processing Pipeline

The hybrid pipeline consists of following modules:

1. Audio Analysis (Optional)

  • Whisper STT (local) or Groq API (cloud) for speech-to-text
  • Audio excitement detection for highlight segments

2. Transcript Analysis (Enabled by default)

  • Gemini AI analysis of transcripts for intelligent highlight extraction
  • Context-aware highlight generation with descriptions

3. Scoreboard OCR (Optional)

  • PaddleOCR-based goal detection from scoreboard
  • Audio boost mode for high accuracy during exciting moments

4. Scene Classification (Per-clip)

  • ResNet18-based scene type classification
  • Detects: wide, close, audience, replay scenes

5. Dynamic Reframing

  • Ball and player detection with YOLO/SoccerNet
  • Temporal filtering (Savitzky-Golay)
  • ROI calculation with hysteresis and scene locking
  • Smoothing (Kalman/EMA/Adaptive EMA)

6. Video Generation

  • Parallel clip generation (3+ clips)
  • Video cropping and encoding
  • Audio preservation and merging

Testing

# Test SoccerNet reframing pipeline
python test_soccernet_pipeline.py input/test_clip.mp4

# Test scene-aware pipeline
python test_scene_aware_pipeline.py

# Test parallel processing performance
python test_parallel_pipeline.py input/test_video.mp4

# Test OCR scoreboard detection
python test_ocr.py

Requirements

  • Python 3.8+
  • FFmpeg (system installation required)
  • 8GB+ RAM recommended
  • GPU optional but recommended (CUDA or Apple Silicon MPS)

Dependencies

Key dependencies include:

  • PyTorch: Deep learning framework
  • Ultralytics YOLO: Object detection
  • OpenCV: Computer vision
  • Whisper: Speech-to-text
  • PaddleOCR: OCR for goal detection
  • PySide6: GUI framework
  • Librosa: Audio analysis
  • scenedetect: Scene boundary detection

Performance

  • Processing Speed: ~15-30 FPS on modern CPU/GPU
  • Parallel Speedup: 2-4x faster for 3+ clips
  • Memory Usage: ~2-4GB per GPU worker
  • Ball Detection: 30-50%+ detection rate with temporal filtering

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages