A PyTorch-based convolutional neural network for real-time American Sign Language (ASL) recognition using webcam input.
This project implements a CNN model that recognizes 29 different ASL signs: 26 letters (A-Z) plus "space", "delete", and "nothing" gestures. The model processes 64x64 RGB images and provides real-time predictions through webcam feed.
Install dependencies using:
install -r requirements.txt
The CNN consists of:
- 3 convolutional layers (32, 64, 128 filters)
- MaxPooling after each conv layer
- 2 fully connected layers (512, 29 outputs)
- ReLU activations
- Input: 64x64 RGB images
- Output: 29 classes
If your test images aren't organized by class, run:
cd src/utils
python organize_test.py
Check for corrupted images:
python detect_bad_files.py
Train the model:
cd src
python train.py
Training parameters:
- Batch size: 64
- Learning rate: 0.001
- Optimizer: Adam
- Loss: CrossEntropyLoss
- Epochs: 10
- Image size: 64x64
Evaluate model performance:
python test.py
This outputs the accuracy percentage on the test set.
Run webcam prediction:
python predict_webcam.py
Controls:
- Place your hand in the blue rectangle (100,100 to 300,300)
- Press 'q' to quit
- Close the window to exit
The model saves checkpoints after each epoch as models/model_epoch_X.pth
. The final model (model_epoch_10.pth
) is used for inference.
Images are preprocessed with:
- Resize to 64x64 pixels
- Convert to tensor
- Normalize with mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5]
- GPU: CUDA-compatible GPU recommended for training
- CPU: Falls back to CPU if CUDA unavailable
- Camera: Webcam for real-time prediction