Sign Language Recognition using MediaPipe & MLP

📌 Project Overview

This project implements real-time sign language recognition using MediaPipe Hands for hand landmark detection and an MLP (Multi-Layer Perceptron) model for character classification. Additionally, we experimented with MobileNetV2 for sign recognition but found that MediaPipe-based MLP performs better in real-time scenarios. The system captures hand gestures from a webcam, extracts landmark features, and predicts sign language letters, dynamically forming words and sentences.

📂 Project Structure

📂 Sign Language Recognition System/
│── 📂 SIGN_TO_SENTENCE_PROJECT/
│    │── 📂 Asl_Sign_Data/                       # Raw ASL dataset
│    │── 📄 asl_mediapipe_keypoints_dataset.csv  # Preprocessed dataset for MLP model
│    │── 📄 asl_mediapipe_mlp_model.h5           # Trained MLP model
│    │── 📄 sign_language_model_MobileNetV2.h5   # Trained MobileNetV2 model
│    │── 📄 Combined_Architecture.ipynb          # Hybrid model experiments
│    │── 📄 LLM.ipynb                            # Language Model Integration
│    │── 📄 Mediapipe_Training.ipynb             # Training script for MLP model
│    │── 📄 MobileNetV2_Training.ipynb           # Training script for MobileNetV2
│    │── 📄 concluion.txt                        # Summary of results
│    │── 📄 requirements.txt                     # Required dependencies

🏗️ Dataset: ASL Kaggle Dataset

The dataset used for training was obtained from Kaggle ASL Sign Language Dataset.
It contains hand gesture images labeled with ASL characters.
For MobileNetV2, we used raw images.
For MLP (MediaPipe), we extracted landmark keypoints from each image and stored them in CSV format.

📌 Model 1: MobileNetV2 Approach

We trained MobileNetV2 on raw images for sign classification.
However, it struggled with real-time sign recognition, leading us to explore MediaPipe-based MLP.

Training MobileNetV2

from tensorflow.keras.applications import MobileNetV2
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten

# Load pre-trained MobileNetV2 model
base_model = MobileNetV2(input_shape=(224, 224, 3), include_top=False, weights='imagenet')

# Add custom classification layers
model = Sequential([
    base_model,
    Flatten(),
    Dense(256, activation='relu'),
    Dense(num_classes, activation='softmax')
])

# Compile model
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

📌 Model 2: MediaPipe + MLP Approach

Extracted hand landmark coordinates using MediaPipe Hands.
Trained a MLP model on the extracted landmark features.
This method proved to be faster and more reliable for real-time recognition.

Training MLP on MediaPipe Landmarks

import mediapipe as mp
import numpy as np
import tensorflow as tf

# Load the trained MLP model
mlp_model = tf.keras.models.load_model("asl_mediapipe_mlp_model.h5")

# Initialize MediaPipe Hands
mp_hands = mp.solutions.hands
hands = mp_hands.Hands(min_detection_confidence=0.7, min_tracking_confidence=0.7)

# Predict a sign using MediaPipe landmarks
def predict_sign(landmarks):
    input_data = np.array(landmarks).flatten().reshape(1, -1)
    prediction = mlp_model.predict(input_data)
    return np.argmax(prediction)

🚀 Running the Sign Language Recognition System

1️⃣ Install Dependencies

pip install -r requirements.txt

2️⃣ Running Model Evaluations

To test the output of individual models, run the last cell in:

Mediapipe_Training.ipynb for MLP model evaluation.
MobileNetV2_Training.ipynb for MobileNetV2 evaluation.

3️⃣ Running the Combined Architecture

To see the working of both MobileNetV2 and MediaPipe integrated, run:

jupyter notebook Combined_Architecture.ipynb

4️⃣ Controls & Commands

Normal Signs → Letters are appended to the sentence.
SPACE Sign → Adds a space.
DELETE Sign → Removes the last character.
NOTHING → No input detected.

⚠️ Limitations & Next Steps

MobileNet did not perform well on real-time images, so we moved to MediaPipe-based MLP.
Next Phase → Building a FastAPI backend (SignConnect-Backend) for better integration and mobile app support.

🔜 Future Work: Text-to-Sign Language Generation

As the next phase of development, we aim to implement Text-to-Sign Language Actions, allowing users to input text that gets translated into sign language animations. Possible technologies we will explore:

AI-generated 3D avatars to perform sign language gestures.
Computer Vision & Reinforcement Learning to map text to sign movements.
Deep Learning models to generate smooth sign transitions.

💡 Open for Contributions

We welcome contributions from the community for this phase! If you're interested in helping develop Text-to-Sign Language Generation, feel free to open an issue or submit a pull request on our GitHub repository.

🤝 Acknowledgments

Uses MediaPipe Hands for landmark detection.
Model trained using TensorFlow & Scikit-Learn.
Inspired by existing research on gesture recognition & sign language AI.

📜 License

This project is licensed under the MIT License - see the LICENSE file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sign Language Recognition using MediaPipe & MLP

📌 Project Overview

📂 Project Structure

🏗️ Dataset: ASL Kaggle Dataset

📌 Model 1: MobileNetV2 Approach

Training MobileNetV2

📌 Model 2: MediaPipe + MLP Approach

Training MLP on MediaPipe Landmarks

🚀 Running the Sign Language Recognition System

1️⃣ Install Dependencies

2️⃣ Running Model Evaluations

3️⃣ Running the Combined Architecture

4️⃣ Controls & Commands

⚠️ Limitations & Next Steps

🔜 Future Work: Text-to-Sign Language Generation

💡 Open for Contributions

🤝 Acknowledgments

📜 License

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Sign Language Recognition using MediaPipe & MLP

📌 Project Overview

📂 Project Structure

🏗️ Dataset: ASL Kaggle Dataset

📌 Model 1: MobileNetV2 Approach

Training MobileNetV2

📌 Model 2: MediaPipe + MLP Approach

Training MLP on MediaPipe Landmarks

🚀 Running the Sign Language Recognition System

1️⃣ Install Dependencies

2️⃣ Running Model Evaluations

3️⃣ Running the Combined Architecture

4️⃣ Controls & Commands

⚠️ Limitations & Next Steps

🔜 Future Work: Text-to-Sign Language Generation

💡 Open for Contributions

🤝 Acknowledgments

📜 License