This project implements a video classification pipeline using the UCF50 dataset. It leverages a Long-term Recurrent Convolutional Network (LRCN) model that extracts spatial features from individual video frames via a ResNet backbone and learns temporal dynamics through an LSTM. The project includes scripts for preprocessing, training, and testing the model.
- Dataset Preparation
- Environment Setup
- Preprocessing and Frame Extraction
- Training the Model
- Testing and Evaluation
- Project Structure
- Customization and Hyperparameters
-
Download Dataset:
Download the UCF50 dataset from here. This dataset contains videos of 50 different human action classes. -
Unzip and Organize:
Unzip the downloaded dataset. The expected folder structure should be as follows:- UCF50 - Action_Class1 - Action_Class2 ... ... ... ... - Action_Class50
Each subdirectory represents a different action class.
-
Python Version:
This project requires Python 3.7 or higher. -
Dependencies:
Install the required Python packages by running:
pip install -r requirements.txt- PyTorch
- torchvision
- OpenCV
- scikit-learn
- tqdm
- numpy
- Pillow
A CUDA-enabled GPU is recommended for training. The code automatically detects GPU availability.
Before training, the raw video files must be converted into frame sequences. The preprocessing module includes functions for:
- The
get_framesfunction uses OpenCV to sample a fixed number of frames per video.
- The
store_framesfunction writes the extracted frames as JPEG images.
To run the preprocessing script, call the following in the Command Line:
python preprocess.pyThe training is managed via a bash script (e.g., train.sh) that calls the main training module.
Important: Update the --frame_dir argument in the script to point to the directory where your preprocessed frame data is stored. You can also adjust other parameters (e.g., number of frames per video, batch size, learning rate) to see how they affect the experiment.
Execute the training script from your terminal:
python run.py --frame_dir UCF50 --train_size 0.75 --test_size 0.15 --model_type lrcn --n_classes 50 --fr_per_vid 16 --batch_size 4 --mode 'train' | Out-File train_log.txtor
bash train.sh- Load the frame dataset.
- Split the dataset into training, validation, and test sets using stratified sampling.
- Apply data augmentation techniques (resizing, random flips, affine transformations).
- Create custom PyTorch Datasets and DataLoaders.
- Initialize the LRCN model using a specified ResNet backbone.
- Set up the loss function, optimizer, and learning rate scheduler.
- Run the training loop while tracking loss and accuracy, saving the best model weights.
-
Configure Testing Parameters:
Update the--ckptargument in your testing script (e.g.,test.sh) to point to the saved best model weights generated during training. -
Run the Testing Script:
Execute the testing script from your terminal:
python run.py --ckpt models\best_model_wts.pt --model_type lrcn --n_classes 50 --model_type lrcn --batch_size 4 --mode eval | Out-File test_log.txtor
bash test.shThe testing script will:
- Load the dataset splits (previously saved during training).
- Create a DataLoader for the test set.
- Load the trained model checkpoint.
- Evaluate the model on the test data by computing overall accuracy, generating classification reports, and optionally producing confusion matrices.
You can modify several parameters to experiment with different settings:
--frame_dir: Path to your preprocessed frames.--fr_per_vid: Number of frames to sample per video.
--model_type: Choose between'lrcn'(default) or other supported models.--cnn_backbone: Options includeresnet18,resnet34,resnet50,resnet101, orresnet152.--rnn_hidden_sizeand--rnn_n_layers: Configure the LSTM network.
--batch_size,--learning_rate,--n_epochs, and--dropoutcontrol the training dynamics.--train_sizeand--test_sizedetermine dataset splits.
By tweaking these parameters, you can study their impact on model performance and experiment with different network configurations.
-
Step 0: Dataset Preparation
Download, unzip, and organize the UCF50 dataset into subdirectories by action class. Thepreprocess.pyscript will organize the subdirectories for you. -
Step 1: Run Training
Executetrain.shafter configuring the--frame_dirand other hyperparameters to train the model. -
Step 2: Run Testing
Executetest.shafter updating the--ckptargument to point to the best model checkpoint to evaluate the model.
Happy Training!