Action Recognition using Vision Transformer (ViT)

This project implements action recognition on videos using the Vision Transformer (ViT) model. It includes a Streamlit-based web application for uploading videos, predicting actions, and visualizing Grad-CAM heatmaps.

Features

Train and evaluate a Vision Transformer (ViT) model for video classification.
Web interface for uploading videos and predicting actions.
Grad-CAM visualizations for model interpretability.

Project Structure

action-recognition-vit
├── src
│   ├── models
│   │   └── vit.py          # Implementation of the Vision Transformer model
│   ├── training
│   │   ├── train.py        # Training script for the ViT model
│   │   └── dataset.py      # Dataset class for loading and preprocessing video data
│   ├── evaluation
│   │   └── evaluate.py     # Evaluation script for assessing model performance
│   ├── web
│   │   ├── app.py          # Web application for user interaction
│   └── utils
│       └── helpers.py      # Utility functions for data processing and visualization
├── requirements.txt         # List of project dependencies
├── README.md                # Project documentation
└── .gitignore               # Files and directories to ignore in Git

Installation

Follow these steps to set up the project on your local machine:

1. Clone the Repository

Clone the repository to your local machine:

git clone https://github.com/your-repo/action-recognition-vit.git
cd action-recognition-vit

2. Set Up a Virtual Environment

Create and activate a virtual environment to manage dependencies:

# Create a virtual environment
python -m venv .venv

# Activate the virtual environment
# On Windows:
.\.venv\Scripts\activate
# On macOS/Linux:
source .venv/bin/activate

3. Install Dependencies

Install the required Python packages:

pip install --upgrade pip
pip install -r requirements.txt

Usage

1. Training the Model

To train the Vision Transformer model on your dataset, run the following command:

python src/training/train.py

2. Evaluating the Model

After training, evaluate the model's performance using:

python src/evaluation/evaluate.py

3. Running the Web Interface

The project includes a Streamlit-based web application for uploading videos and predicting actions.

Steps to Run the App:

Ensure the virtual environment is activated:
```
.\.venv\Scripts\activate
```
Start the Streamlit app:
```
streamlit run src/web/app.py
```
Open the URL : https://action-recognition-using-vit.streamlit.app/

Using the Streamlit App

Features of the App:

Upload Videos: Upload a video file in .mp4, .avi, or .mov format.
Action Prediction: The app predicts the action in the video using the Vision Transformer model.
Grad-CAM Visualizations: Visualize Grad-CAM heatmaps to understand which parts of the video influenced the model's predictions.

Example Workflow:

Upload a Video:
- Use the sidebar to upload a video file.
- Supported formats: .mp4, .avi, .mov.
View Uploaded Video:
- The uploaded video is displayed in the main interface.
Prediction and Visualization:
- The app extracts frames from the video and processes them through the ViT model.
- The predicted action is displayed, and Grad-CAM heatmaps are generated for interpretability.
Interact with Results:
- View Grad-CAM heatmaps for each frame to understand the model's focus areas.
- Upload another video to repeat the process.

Troubleshooting

Dependencies Not Installed: Ensure all dependencies are installed:
```
pip install -r requirements.txt
```
Streamlit App Not Starting: Ensure the virtual environment is activated and all dependencies are installed.

CUDA Issues: If using a GPU, ensure PyTorch is installed with CUDA support:

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

Screenshots

Predicted Action

Grad-CAM Heatmaps

Contributing

Contributions are welcome! Please feel free to submit a pull request or open an issue for any suggestions or improvements.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Action Recognition using Vision Transformer (ViT)

Features

Project Structure

Installation

1. Clone the Repository

2. Set Up a Virtual Environment

3. Install Dependencies

Usage

1. Training the Model

2. Evaluating the Model

3. Running the Web Interface

Steps to Run the App:

Using the Streamlit App

Features of the App:

Example Workflow:

Troubleshooting

Screenshots

Predicted Action

Grad-CAM Heatmaps

Contributing

License

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Action Recognition using Vision Transformer (ViT)

Features

Project Structure

Installation

1. Clone the Repository

2. Set Up a Virtual Environment

3. Install Dependencies

Usage

1. Training the Model

2. Evaluating the Model

3. Running the Web Interface

Steps to Run the App:

Using the Streamlit App

Features of the App:

Example Workflow:

Troubleshooting

Screenshots

Predicted Action

Grad-CAM Heatmaps

Contributing

License