High-Performance Facial Analysis for Video
Forked from cosanlab/py-feat | Maintained by @safhac
This fork introduces a Hybrid Kalman Tracking Engine to py-feat, optimizing it for real-time video processing and low-resource environments (CPU/IoT). By decoupling facial detection (expensive) from tracking (cheap), this version runs 1.2x - 5x faster depending on the workload, with significantly reduced compute costs.
⚡ Key Features
- Hybrid Tracking Architecture: Replaces the naive "detect-every-frame" loop with a smart Kalman Filter tracker.
-
Frame Skipping: Runs heavy face detection only every
$N$ frames (configurable). Intermediate frames are predicted mathematically. - Adaptive Padding: Automatically adjusts crop regions to prevent "drift," ensuring landmarks remain accurate even when the detector is off.
- CPU Optimized: Designed to run real-time facial analysis on standard CPUs, removing the hard dependency on expensive GPUs for video pipelines.
Workload: 50 Frames of Video | Hardware: Standard CPU (Simulated Load)
| Strategy | Detection Interval | Time (s) | FPS | Speedup |
|---|---|---|---|---|
| Baseline (Original) | 1 (Every Frame) | 31.91s | 1.57 | - |
| Hybrid (This Fork) | 5 (Every 5th Frame) | 27.63s | 1.81 | +13.4% |
| Hybrid (Aggressive) | 10 (Every 10th Frame) | Est. | ~2.50 | +40%+ |
> Note: Speedup scales dramatically with heavier models (e.g., RetinaFace/img2pose). The heavier the detector, the more time you save.
🛠️ Installation
You can install this optimized version directly from GitHub:
Bash
pip install git+https://github.com/safhac/py-feat.git@main
Or clone and install in editable mode (recommended for development):
Bash
git clone https://github.com/safhac/py-feat.git
cd py-feat
pip install -e .
🚀 Usage
The new engine works seamlessly with the existing Detector API. Just add the detection_interval argument.
Run the heavy detector only once every 5 frames. The Kalman Filter handles the rest.
Python
from feat.detector import Detector
# Initialize with standard models (CPU friendly configuration)
detector = Detector(
face_model="FaceBoxes",
landmark_model="mobilefacenet",
au_model="rf",
emotion_model="resmasknet",
device="cpu"
)
# Process video with Hybrid Tracking enabled
# detection_interval=5 means we skip the heavy detector 80% of the time!
video_prediction = detector.detect_video(
"input_video.mp4",
detection_interval=5
)
video_prediction.to_csv("output.csv")
For real-time applications (e.g., HRI, Digital Signage) where you need "Smoothness" over "Raw Accuracy":
Python
# Process a live stream or file
# Skip 9 frames (run detector at 3 FPS if video is 30 FPS)
# This significantly frees up CPU for Semantic Analysis
detector.detect_video(
"webcam_stream.mp4",
detection_interval=10
)
🧠 How It Works
Original py-feat treats video as a pile of photos. It runs detect_faces() on Frame 1, Frame 2, Frame 3... regardless of the fact that faces don't teleport. This is a massive waste of compute.
We implemented a stateful tracker that remembers where the face was and predicts where it will be.
- Frame 0 (Search): We run the heavy AI Detector. We find a face at [100, 100]. We initialize a Tracker.
- Frame 1-4 (Track): We skip the AI. We ask the Kalman Filter: "Given the velocity, where is the face now?" It predicts [105, 102]. We crop and analyze that region.
- Frame 5 (Correct): We run the AI Detector again to correct any drift. The Tracker updates its velocity estimates.
- Repeat.
🤝 Contributing
This fork is actively maintained for high-performance use cases. If you have ideas for further optimization (e.g., Quantization, BlazeFace integration), please open an Issue!
Original Library: cosanlab/py-feat