ASL Alphabet LLM Evaluation

Research Overview

This project evaluates current Large Language Models (LLMs) on their ability to identify American Sign Language (ASL) alphabets. The goal is to assess the current state of LLMs in the area of accessibility, specifically their vision capabilities for sign language recognition.

Project Structure

ASL/
├── asl_llm_evaluation.ipynb   # Main evaluation notebook
├── requirements.txt            # Python dependencies
├── asl_alphabet_dataset/       # Dataset folder (create this)
│   ├── A/                     # Images for letter A
│   ├── B/                     # Images for letter B
│   └── ...                    # Continue for all letters
└── evaluation_results/         # Results output folder (auto-created)

Setup Instructions

1. Install Dependencies

pip install -r requirements.txt

2. Set Up Dataset

Create the dataset folder structure and add ASL alphabet images:

mkdir -p asl_alphabet_dataset/{A..Z}

Place ASL hand sign images in their respective letter folders.

3. Configure API Keys

Set your API keys as environment variables:

export OPENAI_API_KEY="your-openai-api-key"
export ANTHROPIC_API_KEY="your-anthropic-api-key"
export GOOGLE_API_KEY="your-google-api-key"

Or update them directly in the notebook's CONFIG section.

4. Run the Evaluation

Open the Jupyter notebook:

jupyter notebook asl_llm_evaluation.ipynb

Follow the cells in order to:

Load your ASL dataset
Initialize LLM evaluators
Run the evaluation pipeline
Analyze results with visualizations

Features

Evaluation Capabilities

Multi-Model Testing: Evaluate GPT-4V, Claude 3, Gemini Pro Vision, and more
Prompt Engineering: Compare different prompting strategies
Comprehensive Metrics: Accuracy, confusion matrices, per-class performance
Response Time Analysis: Measure inference speed
Error Analysis: Identify common misclassifications

Visualization Tools

Model comparison charts
Confusion matrices
Per-class accuracy breakdown
Response time distributions
Prompt strategy comparisons

Dataset Requirements

Image Format

Supported formats: JPG, PNG, JPEG
Clear hand signs against contrasting background
Consistent lighting recommended
Various hand positions and orientations for robustness

Folder Structure

Each letter (A-Z) should have its own folder containing multiple image examples.

Results

Results are automatically saved to CSV files in the evaluation_results/ folder with timestamps. Each evaluation run generates:

Detailed predictions for each image
Model performance metrics
Response times
Raw LLM responses

Research Applications

This evaluation framework can help:

Assess LLM vision capabilities for accessibility
Identify areas for improvement in sign language recognition
Compare different LLM providers and models
Optimize prompting strategies for ASL classification
Generate insights for future model development

Customization

Adding New Models

Extend the evaluator classes in the notebook to add support for additional LLMs.

Modifying Prompts

Edit the create_prompt() method in the BaseLLMEvaluator class to test new prompting strategies.

Extending to Other Sign Languages

The framework can be adapted for other sign languages by modifying the class labels and dataset structure.

Citation

If you use this evaluation framework in your research, please cite appropriately and acknowledge the accessibility focus of this work.

License

This project is for research and educational purposes.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.gitignore		.gitignore
PromptManager.py		PromptManager.py
README.md		README.md
README_MULTI_LLM_EVALUATION.md		README_MULTI_LLM_EVALUATION.md
asl_classifier_interface.py		asl_classifier_interface.py
asl_data_preparation.py		asl_data_preparation.py
asl_evaluation_20250814_220330.png		asl_evaluation_20250814_220330.png
asl_evaluation_20250814_220416.png		asl_evaluation_20250814_220416.png
asl_evaluation_20250814_220524.png		asl_evaluation_20250814_220524.png
asl_evaluation_report_20250814_220331.txt		asl_evaluation_report_20250814_220331.txt
asl_evaluation_report_20250814_220420.txt		asl_evaluation_report_20250814_220420.txt
asl_evaluation_report_20250814_220527.txt		asl_evaluation_report_20250814_220527.txt
asl_evaluation_report_20250814_220551.txt		asl_evaluation_report_20250814_220551.txt
asl_llm_evaluation.ipynb		asl_llm_evaluation.ipynb
extra_code.ipynb		extra_code.ipynb
notebook.ipynb		notebook.ipynb
predictions.csv		predictions.csv
requirements.txt		requirements.txt
test_eval.ipynb		test_eval.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ASL Alphabet LLM Evaluation

Research Overview

Project Structure

Setup Instructions

1. Install Dependencies

2. Set Up Dataset

3. Configure API Keys

4. Run the Evaluation

Features

Evaluation Capabilities

Visualization Tools

Dataset Requirements

Image Format

Folder Structure

Results

Research Applications

Customization

Adding New Models

Modifying Prompts

Extending to Other Sign Languages

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ASL Alphabet LLM Evaluation

Research Overview

Project Structure

Setup Instructions

1. Install Dependencies

2. Set Up Dataset

3. Configure API Keys

4. Run the Evaluation

Features

Evaluation Capabilities

Visualization Tools

Dataset Requirements

Image Format

Folder Structure

Results

Research Applications

Customization

Adding New Models

Modifying Prompts

Extending to Other Sign Languages

Citation

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages