Skip to content

Long-term Recurrent Convolutional Network (LRCN) model that extracts spatial features from individual video frames via a ResNet backbone and learns temporal dynamics through an LSTM.

License

Notifications You must be signed in to change notification settings

dna-witch/video-classification-lrcn

Repository files navigation

Video Classification with UCF50

This project implements a video classification pipeline using the UCF50 dataset. It leverages a Long-term Recurrent Convolutional Network (LRCN) model that extracts spatial features from individual video frames via a ResNet backbone and learns temporal dynamics through an LSTM. The project includes scripts for preprocessing, training, and testing the model.


Table of Contents


Dataset Preparation

Step 0: Download and Unzip Dataset

  1. Download Dataset:
    Download the UCF50 dataset from here. This dataset contains videos of 50 different human action classes.

  2. Unzip and Organize:
    Unzip the downloaded dataset. The expected folder structure should be as follows:

     - UCF50
         - Action_Class1
         - Action_Class2
         ... ... ... ...
         - Action_Class50
    

Each subdirectory represents a different action class.


Environment Setup

  1. Python Version:
    This project requires Python 3.7 or higher.

  2. Dependencies:
    Install the required Python packages by running:

pip install -r requirements.txt

Key Libraries

  • PyTorch
  • torchvision
  • OpenCV
  • scikit-learn
  • tqdm
  • numpy
  • Pillow

Hardware Requirements

A CUDA-enabled GPU is recommended for training. The code automatically detects GPU availability.


Preprocessing and Frame Extraction

Before training, the raw video files must be converted into frame sequences. The preprocessing module includes functions for:

Uniform Frame Sampling

  • The get_frames function uses OpenCV to sample a fixed number of frames per video.

Saving Frames to Disk

  • The store_frames function writes the extracted frames as JPEG images.

To run the preprocessing script, call the following in the Command Line:

python preprocess.py

Training the Model

Step 1: Run Training

Configure Training Parameters

The training is managed via a bash script (e.g., train.sh) that calls the main training module.
Important: Update the --frame_dir argument in the script to point to the directory where your preprocessed frame data is stored. You can also adjust other parameters (e.g., number of frames per video, batch size, learning rate) to see how they affect the experiment.

Run the Training Script

Execute the training script from your terminal:

python run.py --frame_dir UCF50 --train_size 0.75 --test_size 0.15 --model_type lrcn --n_classes 50 --fr_per_vid 16 --batch_size 4 --mode 'train' | Out-File train_log.txt

or

bash train.sh

During Training, the Script Will:

  • Load the frame dataset.
  • Split the dataset into training, validation, and test sets using stratified sampling.
  • Apply data augmentation techniques (resizing, random flips, affine transformations).
  • Create custom PyTorch Datasets and DataLoaders.
  • Initialize the LRCN model using a specified ResNet backbone.
  • Set up the loss function, optimizer, and learning rate scheduler.
  • Run the training loop while tracking loss and accuracy, saving the best model weights.

Testing and Evaluation

Step 2: Run Testing

  • Configure Testing Parameters:
    Update the --ckpt argument in your testing script (e.g., test.sh) to point to the saved best model weights generated during training.

  • Run the Testing Script:
    Execute the testing script from your terminal:

python run.py --ckpt models\best_model_wts.pt --model_type lrcn --n_classes 50 --model_type lrcn --batch_size 4 --mode eval | Out-File test_log.txt

or

bash test.sh

Testing Script Overview

The testing script will:

  • Load the dataset splits (previously saved during training).
  • Create a DataLoader for the test set.
  • Load the trained model checkpoint.
  • Evaluate the model on the test data by computing overall accuracy, generating classification reports, and optionally producing confusion matrices.

Customization and Hyperparameters

You can modify several parameters to experiment with different settings:

Data Parameters

  • --frame_dir: Path to your preprocessed frames.
  • --fr_per_vid: Number of frames to sample per video.

Model Parameters

  • --model_type: Choose between 'lrcn' (default) or other supported models.
  • --cnn_backbone: Options include resnet18, resnet34, resnet50, resnet101, or resnet152.
  • --rnn_hidden_size and --rnn_n_layers: Configure the LSTM network.

Training Parameters

  • --batch_size, --learning_rate, --n_epochs, and --dropout control the training dynamics.
  • --train_size and --test_size determine dataset splits.

By tweaking these parameters, you can study their impact on model performance and experiment with different network configurations.


Summary of Steps

  • Step 0: Dataset Preparation
    Download, unzip, and organize the UCF50 dataset into subdirectories by action class. The preprocess.py script will organize the subdirectories for you.

  • Step 1: Run Training
    Execute train.sh after configuring the --frame_dir and other hyperparameters to train the model.

  • Step 2: Run Testing
    Execute test.sh after updating the --ckpt argument to point to the best model checkpoint to evaluate the model.

Happy Training!

About

Long-term Recurrent Convolutional Network (LRCN) model that extracts spatial features from individual video frames via a ResNet backbone and learns temporal dynamics through an LSTM.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published