A system that utomatically create short highlight reels from videos using Ray distributed computing and visual analysis. It identifies and extracts the most interesting moments from videos. Uses Ray for distributed processing and MobileNetV3 for visual feature extraction.
The easiest way to get started is using Anyscale Platform, which provides a ready-to-use Ray cluster:
- Create a free account at anyscale.com
- Create a workspace - Your Ray cluster will be automatically provisioned and ready to use
- Clone this repository in your workspace
- Start coding - The cluster is already up and running with all necessary Ray resources
This eliminates the need for local setup and gives you immediate access to GPU resources and distributed computing capabilities.
For local development, continue with the Installation section below.
# Python 3.12 required
python --version
# Install FFmpeg (system requirement)
# macOS:
brew install ffmpeg
# Ubuntu/Debian:
sudo apt-get install ffmpegcd video-highlight-generator
# Install dependencies
pip install -r requirements.txtKey Dependencies:
ray[default,data]==2.47.0- Distributed computingtorch==2.5.1- Deep learningopencv-python-headless==4.10.0.84- Headless video processingtorchvision==0.20.1- Pre-trained models
# macOS only - for terminal video playback
brew install timg# Downloads 3 Creative Commons videos (~50MB)
python scripts/download_sample_videos.pypython demo.pyThe demo will:
- Show menu with video sources (sample/custom/YouTube)
- Preprocess video (extract frames at 1 FPS)
- Extract visual features with MobileNetV3 (distributed)
- Detect highlights using multi-signal analysis
- Generate highlight reel (≤30 seconds)
- Display results (with terminal playback if timg available)
# Run tests sequentially
python tests/test_01_environment.py # Ray + device detection
python tests/test_02_video_loading.py # Parallel video loading
python tests/test_03_features.py # Feature extraction (63+ FPS)
python tests/test_04_highlights.py # Highlight detection
python tests/test_05_generation.py # Video generation
python tests/test_06_pipeline.py # End-to-end pipelinevideo-highlight-generator/
├── demo.py # Interactive CLI (1083 lines)
├── requirements.txt # Python dependencies
├── src/
│ ├── pipeline.py # Main orchestrator (380 lines)
│ ├── models/
│ │ └── feature_extractors.py # Ray actors for ML inference
│ ├── features/
│ │ ├── highlight_detector.py # Detection algorithms (558 lines)
│ │ └── video_generator.py # FFmpeg wrapper
│ └── utils/
│ ├── ray_utils.py # Cluster compatibility (144 lines)
│ ├── timg_video_player.py # Terminal video playback
│ └── side_by_side_player.py # Comparison viewer
├── scripts/
│ ├── download_sample_videos.py # Get demo videos
│ ├── preprocess_videos.py # Batch preprocessing
│ └── cleanup.sh # Remove generated files
├── tests/ # 6 comprehensive tests
└── data/ # Local storage (or /mnt/cluster_storage on clusters)
4-Phase Pipeline:
- Preprocessing - FFmpeg extracts frames (1 FPS) and audio
- Feature Extraction - MobileNetV3 generates 576-dim visual features (distributed via Ray actors)
- Highlight Detection - Multi-signal analysis (variance + novelty + motion) identifies peaks
- Video Generation - FFmpeg extracts clips, adds transitions, concatenates to ≤30s
Detection Algorithm:
- Computes importance scores from visual features
- Uses adaptive thresholds based on video duration
- Detects peaks with SciPy local maxima
- Ranks highlights by importance score
- Enforces 30-second maximum duration
The system runs on both local machines and Ray clusters without code changes.
Automatic Features:
- Environment detection (local vs cluster via
RAY_ADDRESS) - Storage path switching (
./data→/mnt/cluster_storage) - Headless OpenCV for worker nodes
- Graceful degradation (timg fallback to metadata display)
- Resource management (Ray handles CPU/GPU allocation)
Cluster Test Results (Ray 2.47.0 + Tesla T4 GPUs):
✅ test_01_environment.py - Ray initialization and device detection
✅ test_02_video_loading.py - Parallel video loading with Ray Data
✅ test_03_features.py - Distributed feature extraction (63+ FPS)
✅ test_04_highlights.py - Highlight detection with adaptive thresholds
✅ test_05_generation.py - Video highlight reel generation (11 clips)
✅ test_06_pipeline.py - End-to-end pipeline (15.1s total)
Usage on Cluster:
# Copy videos to cluster storage
cp video.mp4 /mnt/cluster_storage/raw/demo/
# Run (automatically detects cluster and uses cluster storage)
python demo.pypython demo.py
# Select option 2 (Custom video)
# Enter path: /path/to/video.mp4# Install yt-dlp first
pip install yt-dlp
python demo.py
# Select option 3 (YouTube URL)
# Enter URL: https://youtube.com/watch?v=...# Preprocess all videos in data/raw/demo/
python scripts/preprocess_videos.pybash scripts/cleanup.shThe pipeline uses sensible defaults but can be customized:
Pipeline Parameters:
num_actors- Number of Ray actors for parallel processing (default: 2)target_fps- Frame extraction rate (default: 1.0 FPS)resolution- Frame size for ML model (default: 224×224)
Detection Parameters:
variance_weight- Visual diversity score weight (default: 0.4)novelty_weight- Uniqueness score weight (default: 0.3)motion_weight- Action intensity score weight (default: 0.3)
Generation Parameters:
clip_duration- Individual clip length (default: 3.0s)fade_duration- Transition fade time (default: 0.5s)max_duration- Maximum highlight reel length (default: 30.0s)
Models:
- MobileNetV3-small (pre-trained on ImageNet)
- 576-dimensional visual features
- Automatic device selection (CUDA > MPS > CPU)
Algorithms:
- Feature variance (visual diversity)
- Feature novelty (cosine distance from mean)
- Motion intensity (frame-to-frame difference)
- SciPy peak detection with adaptive thresholds
Ray Patterns:
- Actor pool for stateful workers
- Models loaded once per actor
- Distributed batch processing
- Automatic task distribution
- Ray Documentation
- Ray Actors Guide
- Ray Cluster Quickstart
- Module README - Learning path and context