Python: ==3.10.*
detect-face-lib is a modular face detection + JSON augmentation toolkit that attaches face detections to det-v1 (detections) or track-v1 (tracked) payloads produced by earlier stages in the Vision Pipeline.
This is the Face Augmentation stage of the Vision Pipeline.
Detectors included:
- retinaface:
retinaface-pytorchbackend (named variant registry)
By default,
detect-face-libdoes not write any files. You opt-in to saving JSON, frames, or annotated video via flags.
Original Video (.mp4)
│
▼
detect-lib
(Detection Stage)
│
└── detections.json (det-v1)
│
▼
track-lib
(Tracking + ReID)
│
└── tracked.json (track-v1)
│
▼
detect-face-lib
(Face Augmentation)
│
└── faces.json (augmented; face-v1 meta)
Note: Each stage consumes the original video + the upstream JSON from the previous stage.
Stage 1 (Detection):
- PyPI: https://pypi.org/project/detect-lib/
- GitHub: https://github.com/Surya-Rayala/VideoPipeline-detection
Stage 2 (Tracking + ReID):
- PyPI: https://pypi.org/project/gallery-track-lib/
- GitHub: https://github.com/Surya-Rayala/VisionPipeline-gallery-track
detect-face-lib returns an augmented JSON payload in-memory that preserves the upstream schema (det-v1 or track-v1) and adds:
face_augment: metadata about the face detector + association rules (versioned)detections[*].faces: for each frame, matched faces are attached under the matched person detection
Each face entry is intentionally minimal:
bbox: face box[x1,y1,x2,y2]score: face confidencelandmarks: 5-point landmarks[[x,y] x5]when provided by the backend
This example assumes the upstream JSON is track-v1; the same structure applies for det-v1 (it will simply lack tracker fields).
{
"schema_version": "track-v1",
"parent_schema_version": "det-v1",
"video": {
"path": "in.mp4",
"fps": 30.0,
"frame_count": 120,
"width": 1920,
"height": 1080
},
"tracker": {
"name": "gallery_hybrid"
},
"face_augment": {
"version": "face-v1",
"parent_schema_version": "track-v1",
"detector": {
"name": "retinaface",
"variant": "resnet50_2020-07-20",
"max_size": 1024,
"conf_thresh": 0.5,
"nms_thresh": 0.3,
"device": "mps"
},
"association": {
"associate_class_ids": [0],
"gallery_filter": false,
"iou_thresh": 0.0,
"containment": true,
"inclusive": true,
"kpt_indices": null,
"kpt_conf": 0.3,
"flexible_kpt": null
}
},
"frames": [
{
"frame_index": 0,
"detections": [
{
"bbox": [100.0, 50.0, 320.0, 240.0],
"score": 0.91,
"class_id": 0,
"class_name": "person",
"track_id": "3",
"gallery_id": "person_A",
"faces": [
{
"bbox": [140.0, 70.0, 210.0, 150.0],
"score": 0.98,
"landmarks": [[160.0, 95.0], [195.0, 95.0], [178.0, 110.0], [165.0, 130.0], [192.0, 130.0]]
}
]
}
]
}
]
}- Returned (always): the augmented payload is available as
FaceResult.payload(Python) and is always produced in-memory. - Saved (opt-in): nothing is written unless you enable artifacts:
--jsonwrites<run>/faces.json--frameswrites annotated frames under<run>/frames/--save-videowrites an annotated video under<run>/...
When no artifacts are enabled, no output directory/run folder is created.
Use this if you want to install and use the tool without cloning the repo. Requires Python >= 3.10.
pip install detect-face-libNote: the PyPI package name is
detect-face-lib, but the Python module/import name remainsdetect_face.
If you want GPU acceleration on NVIDIA CUDA, install a CUDA-matching build of torch and torchvision.
If you installed CPU-only wheels by accident, uninstall and reinstall the correct CUDA wheels (use the official PyTorch selector for your CUDA version).
pip uninstall -y torch torchvision
# then install the CUDA-matching wheels for your system
# (see: https://pytorch.org/get-started/locally/)Global help:
python -m detect_face.cli.detect_faces -hList detectors:
python -m detect_face.cli.detect_faces --list-detectorsList variants for a detector:
python -m detect_face.cli.detect_faces --detector retinaface --list-variantspython -m detect_face.cli.detect_faces \
--json-in tracked.json \
--video in.mp4python -m detect_face.cli.detect_faces \
--json-in detections.json \
--video in.mp4python -m detect_face.cli.detect_faces \
--json-in tracked.json \
--video in.mp4 \
--json \
--frames \
--save-video annotated.mp4 \
--out-dir out --run-name demo--json-in <path>: Path to the det-v1 or track-v1 JSON to augment.--video <path>: Path to the original source video used to generate the JSON. Frame order must align.
--list-detectors: Print available face detector backends and exit.--list-variants: Print supported named variants for--detectorand exit.
--detector <name>: Face backend to use (default:retinaface).--variant <name>: Backend model variant (named variant). Forretinaface, this is typicallyresnet50_2020-07-20.--max-size <int>: Cap longer side for detector input.- Lower → faster, may miss small faces.
- Higher → slower, better for small faces.
--device <auto|cpu|mps|cuda|cuda:0|0|1...>: Compute device.autoresolves tocuda:0if available, elsemps, elsecpu.
--conf-thresh <float>: Minimum face confidence.- Increase → fewer false positives (fewer faces).
- Decrease → more faces (more false positives).
--nms-thresh <float>: Non-maximum suppression threshold inside the face detector.- Lower → more aggressive suppression.
--associate-classes <ids>: Class IDs eligible for face association (e.g.,0for person). If omitted, the tool tries to auto-detect class_name=='person'; if not found, all classes are eligible.--iou-thresh <float>: IoU threshold for candidate person-face pairing.0.0considers any overlap.0.1–0.3is stricter and can reduce wrong assignments.
--containment/--no-containment: Require (or not) that the face box lies fully inside the person box.- Containment on → safer (fewer bad matches).
- Containment off → more permissive (may increase mismatches in crowded frames).
--gallery-filter: Track-v1 only: only assign faces wheretrack_id == gallery_id(when both exist).
--kpt-indices <ids>: Body keypoint indices to require inside the face box.- Only used if the input JSON contains
keypoints. If keypoints are missing, a warning is emitted and the rule is skipped.
- Only used if the input JSON contains
--kpt-conf <float>: Minimum confidence for keypoints used by--kpt-indices.--flexible-kpt <N>: If set, requires at leastNkeypoints inside the face box when>=Nsurvive the confidence threshold; if fewer survive, requires all surviving keypoints inside.
--json: Write augmented JSON to<run>/faces.json.--frames: Save annotated frames under<run>/frames/(can be large).--save-video <name.mp4>: Save an annotated video under<run>/<name.mp4>.--out-dir <dir>: Output root used only when saving artifacts (default:out).--run-name <name>: Run folder name under--out-dir. If omitted, artifacts go directly under--out-dir.--fourcc <fourcc>: FourCC codec for saved video (default:mp4v).--display: Show a live annotated preview (press ESC to stop). Does not write files unless saving flags are set.
--no-progress: Disable progress bar.
You can use detect-face-lib as a library after installing it with pip.
python -c "import detect_face; print(detect_face.available_detectors())"Required
json_in: Path to det-v1 or track-v1 JSON.video: Path to the source video.
Detector
detector: Backend name (e.g.,"retinaface").variant: Variant name for the backend.max_size: Cap longer side.device:"auto","cpu","mps","cuda","cuda:0", or an index like"0".conf_thresh,nms_thresh: Detector thresholds.
Association
associate_class_ids: List of class_ids eligible for association.iou_thresh,containment: Pairing rules.gallery_filter: Track-v1 only.
Keypoints (optional)
kpt_indices,kpt_conf,flexible_kpt: Keypoint-based validation (used only when keypoints exist).
Artifacts (all off by default)
save_json_flag: Write<run>/faces.json.save_frames: Write<run>/frames/*.jpg.save_video: Filename for annotated video under the run folder.out_dir,run_name,fourcc,display,no_progress.
Returns a FaceResult with payload (augmented JSON), paths (only populated when saving), and stats.
Create run_faces.py:
from detect_face import detect_faces_video
res = detect_faces_video(
json_in="tracked.json",
video="in.mp4",
detector="retinaface",
variant="resnet50_2020-07-20",
device="auto",
)
print(res.stats)
print("face_augment" in res.payload)
print(res.paths) # populated only if you enable saving artifactsRun:
python run_faces.pyUse this if you are developing locally or want reproducible project environments.
Install uv: https://docs.astral.sh/uv/getting-started/installation/#standalone-installer
Verify:
uv --versiongit clone https://github.com/Surya-Rayala/VisionPipeline-detect-face.git
cd VisionPipeline-detect-face
uv syncFor best performance on NVIDIA GPUs, make sure torch and torchvision are installed with a build that matches your CUDA toolkit / driver stack.
If you added CPU-only builds earlier, remove them and add the correct CUDA wheels, then re-sync.
uv remove torch torchvision
# then add the CUDA-matching wheels for your system
# (see: https://pytorch.org/get-started/locally/)
uv add <compatible torch torchvision>
uv syncGlobal help:
uv run python -m detect_face.cli.detect_faces -hList detectors / variants:
uv run python -m detect_face.cli.detect_faces --list-detectors
uv run python -m detect_face.cli.detect_faces --detector retinaface --list-variantsBasic command (track-v1 input):
uv run python -m detect_face.cli.detect_faces \
--json-in tracked.json \
--video in.mp4Basic command (det-v1 input):
uv run python -m detect_face.cli.detect_faces \
--json-in detections.json \
--video in.mp4Save artifacts (opt-in):
uv run python -m detect_face.cli.detect_faces \
--json-in tracked.json \
--video in.mp4 \
--json --frames --save-video annotated.mp4 \
--out-dir out --run-name demoThis project is licensed under the MIT License. See LICENSE.