This repository presents a comprehensive, production-grade framework for real-time, markerless, multi-camera, and multi-person 3D coordinate capture, specifically tailored for quantitative clinical scoring environments. The system is meticulously designed to support established clinical protocols such as ARAT (Action Research Arm Test), UPDRS (Unified Parkinson's Disease Rating Scale), and Fugl-Meyer, with robust handling of scenarios involving both patients and clinical scorers within the same field of view.
Leveraging state-of-the-art computer vision and deep learning techniques-including YOLOv11 for person detection, DeepSort for multi-person tracking, and MediaPipe Holistic for detailed landmark extraction-this pipeline enables precise, synchronized, and scalable kinematic data acquisition across multiple calibrated cameras. Extensive statistical validation and visualization modules ensure the reliability and interpretability required for clinical and research-grade deployments.
- Set up Calibrated Multi-Cam 3D Coordinate Capture Environment for Clinical Scoring
- Multi-Camera Calibration: Robust routines for intrinsic and extrinsic calibration with chessboard patterns, supporting high-precision 3D triangulation and undistortion.
- Real-Time, Multi-Person Detection and Tracking: Seamless integration of YOLOv11 and DeepSort enables reliable detection and persistent tracking of multiple individuals (patients, scorers) even during bounding box overlaps.
- Markerless 3D Landmark Extraction: MediaPipe Holistic provides real-time extraction of face, pose, and hand landmarks for each detected individual, across all cameras.
- Parallelized Processing: ThreadPoolExecutor-based multithreading ensures optimal frame rates on multi-core systems, scaling efficiently with the number of persons and cameras.
- Structured Data Output: Per-person, per-camera CSVs for face, pose, and hand coordinates, including frame-level temporal alignment.
- Comprehensive Statistical Validation: Automated error quantification, advanced visualizations (histograms with KDE, CDFs, boxplots, Bland–Altman, regression, 3D scatter), and CSV-based reporting for calibration and tracking quality.
- Clinical Scoring Support: Output data is directly compatible with ARAT, UPDRS, Fugl-Meyer, and other clinical movement scoring protocols.
- Scalable and Extensible: Modular design enables adaptation to additional cameras, scoring protocols, or custom downstream analysis.
- Input: Multiple synchronized video streams or live camera feeds, with both patient(s) and scorer(s) in frame.
- Calibration: Intrinsic and extrinsic camera calibration using chessboard images, with automatic error metrics and visual feedback.
- Detection & Tracking: YOLOv11 for initial person detection, DeepSort for track consistency across frames and cameras.
- Landmark Extraction: MediaPipe Holistic (face, pose, hands) per detected ROI, executed in parallel threads.
- Data Aggregation: Global 3D coordinates computed for each domain (face, pose, hand) and each tracked individual, stored per camera.
- Statistical Analysis: Automated validation routines for calibration and tracking, with detailed plots and CSV summaries.
- Output: Structured CSVs and visualizations for downstream clinical or research analysis.
- Clone the repository:
git clone https://github.com/aryanbhardwaj24/Set-up-calibrated-multi-cam-3D-coordinate-capture-environment.git
cd Set-up-calibrated-multi-cam-3D-coordinate-capture-environment- Install dependencies:
pip install -r requirements.txt
.
├── capstone.py
├── utils.py
├── requirements.txt
├── calibration_input_images_cam1/
├── calibration_input_images_cam2/
├── calibration_input_videos/
├── calibration_compare_stats/
│ ├── face/
│ ├── pose/
│ └── hand/
├── multiperson_multithread_multicam/
│ ├── cam1/
│ └── cam2/
└── ...
- capstone.py: Main pipeline and entry point for all operations.
- utils.py: Utility functions (FPS calculation, drawing, etc.).
- requirements.txt: All Python dependencies.
- calibration_input_images_camX/: Chessboard images for camera X calibration.
- calibration_input_videos/: Raw video files for calibration and capture.
- calibration_compare_stats/: Statistical results and plots for calibration quality.
- multiperson_multithread_multicam/: Output CSVs for each person/camera.
-
Capture Chessboard Images:
- Acquire 10–20 high-quality chessboard images per camera from diverse angles.
- Store images in
calibration_input_images_cam1/,calibration_input_images_cam2/, etc.
-
Run Calibration:
- Calibration is handled automatically at runtime. The system will:
- Detect chessboard corners.
- Calculate intrinsic (camera matrix, distortion) and extrinsic parameters.
- Save undistorted images and error metrics for verification.
- Calibration is handled automatically at runtime. The system will:
-
Review Calibration Quality:
- Visual and statistical outputs (mean, std, per-axis and Euclidean errors) are generated in
calibration_compare_stats/.
- Visual and statistical outputs (mean, std, per-axis and Euclidean errors) are generated in
- Single Camera (Single/Multi-Person):
python capstone.py --device 0 --width 960 --height 540- The pipeline detects, tracks, and extracts landmarks for all persons, saving results to CSV.
- Full Multi-Camera, Multi-Person, Multi-Threaded Mode:
python capstone.py- By default,
main_multiperson_multithreaded_multicam()is invoked. - The system synchronizes two cameras, detects and tracks multiple persons with DeepSort, and extracts landmarks in parallel for each person and camera.
| Argument | Description | Default |
|---|---|---|
--device |
Camera device index or video file | 0 |
--width |
Frame width | 960 |
--height |
Frame height | 540 |
--upper_body_only |
Only track upper body | False |
--min_detection_confidence |
Min confidence for detection | 0.5 |
--min_tracking_confidence |
Min confidence for tracking | 0.5 |
--use_brect |
Draw bounding rectangles | False |
- CSV Files: For each tracked individual and each camera, CSVs are generated for face, pose, left hand, and right hand. Each row contains:
frame,x,y,z. - FPS Logs: Per-frame FPS is logged for performance profiling.
- Calibration Stats: Per-frame and summary statistics for calibration accuracy are output as CSV and plots.
- Basic Plots:
- Raw vs. calibrated coordinates (per axis)
- Error vs. frame with mean ± std deviation bands
- Euclidean error vs. frame
- Advanced Plots:
- Histograms with KDE for error distributions
- Cumulative Distribution Function (CDF) plots
- Box-and-whisker plots
- Bland–Altman plots for agreement analysis
- Scatter plots with regression and Pearson correlation
- 3D scatter plots of raw vs. calibrated points
- Automated Generation: All plots are auto-generated and organized by domain (face, pose, hand) and camera.
- Multi-Person, Multi-Role Support: Designed for real-world clinical environments, robustly handling multiple patients and scorers in the same field of view, even under occlusion and bounding box overlap.
- Protocol Compatibility: Output data is directly applicable to ARAT, UPDRS, Fugl-Meyer, and other movement scoring protocols, supporting both quantitative and qualitative analysis.
- Data Integrity and Reliability: Calibration and error quantification routines ensure high confidence in 3D kinematic data, suitable for clinical research and regulatory requirements.
- Calibration Issues: Ensure clear, well-lit chessboard images with sufficient positional diversity. Confirm checkerboard size matches code configuration.
- Performance Bottlenecks: Adjust the number of threads in ThreadPoolExecutor to match available CPU cores. Reduce video resolution if necessary.
- Tracking Loss: Maintain adequate lighting and minimize occlusions. DeepSort is robust but may be challenged by severe overlaps or rapid movement.
- Landmark Extraction Failures: MediaPipe Holistic may struggle with extreme poses or occlusions. Consider integrating additional models for specialized scenarios.
- OpenCV Camera Calibration: OpenCV Docs
- MediaPipe Holistic: MediaPipe Solutions
- Ultralytics YOLOv11: Ultralytics
- DeepSort-Realtime: DeepSort-Realtime
This project was made possible by the continuous guidance and mentorship of Dr. Mohan Raghavan, who provided the opportunity and support to pursue this work over a dedicated four-month period. The development also builds upon the collective innovations of the open-source community in computer vision, deep learning, and clinical biomechanics.
This project is licensed under the MIT License - see the LICENSE file for details.
Aryan Bhardwaj
For questions, feature requests, or contributions, please open an issue or contact the author directly.
Note:
This repository is intended for advanced users, clinical researchers, and developers seeking a robust, extensible, and high-performance solution for multi-person, multi-camera 3D kinematic capture in clinical scoring environments. For detailed API documentation, refer to the code comments and function docstrings in capstone.py.