This README file provides instructions on how to run the code for final project, unit COS30028 - Spring 2025
Clone the repository and install the required dependencies:
git clone https://github.com/thuanbui1309/action-recognition.git
cd action-recognition
pip install -r requirements.txtData and models for all tasks are uploaded in this Google Drive folder. Please go there and download them:
https://drive.google.com/drive/folders/1tElWQyrQ2OA5MMUxUwf_UelbDUpp1czJ?usp=sharingAfter downloading, extract the files and move them to the appropriate directories. The correct structure will look like this:
action-recognition/
│
├── data/
│ └── demo/
│ ├── test1.mp4/
│ ├── test2.mp4/
│ └── test3.mp4/
│ └── HGP/
│ ├── images/
│ │ ├── annotations/
│ │ ├── train/
│ │ └── val/
│ └── labels/
│ │ ├── train/
│ │ ├── val/
│ └── labels_old/
│ ├── train/
│ ├── val/
│ └── HGP_phone_hand/
│ ├── images/
│ │ ├── train/
│ │ └── val/
│ └── labels/
│ │ ├── train/
│ │ ├── val/
│ └── data.yaml
│ └── UCF101/
│ ├── v_ApplyEyeMakeup_g01_c01.avi
│ ├── v_ApplyEyeMakeup_g01_c02.avi
│ └── ...
├── models/
│ ├── movinet/
│ │ └── a0
│ │ └── trainings
│ │ └── labels.npy
│ └── yolo phone hand detection/
│ └── yolo pose/
├── augment.py
├── classify.py
├── pose_estimation.py
├── process_hgp.py
├── requirements.txt
└── README.md
To run data preprocessing, you can run file augment.py. This will automatically augment and split the data into correct directory. You can customize the augmentation parameters in the file.
python augment.pyParameters:
--input: Path to raw videos--output: Path to output videos--split_output: Path to splited folder--labels: Augment only chosen labels--workers: Number of parallel workers
The model is trained on Google Colab. You can access the training notebook and saved model in folder models/movinet.
To run inference on a video, you can use the classify.py script:
# Example command
python classify.py --input data/demo/test1.mp4Parameters:
--input: Path to raw video--augmented: Set to True if to inference on model trained on augmented data--lables: Labels for prediction, needs to match the training labels--env: Set toxvfborxcbfor headless display
We need data preprocessing to fine tune the object detection model. Please run file process_hgp.py. This will automatically augment and split the HGP dataset into correct directory
python process_hgp.pyThe model is trained on Google Colab. You can access the training notebook and saved model in folder models/yolo phone hand detection.
To run inference on a video, you can use the pose_estimation.py script:
# Example command
python pose_estimation.pyParameters:
--pose_model: Path to model for pose_estimation--object_detection_model: Path to model for object detection--cam_idx: Camera index--env: Set toxvfborxcbfor headless display--history_frames: Number of frames to keep in history for motion analysis--smoothing_window: Window size for temporal smoothing
- The MoViNet model works best with videos that contain a single dominant action.
- The pose estimation approach can handle multiple people performing different actions simultaneously.