Skip to content

githubbermoon/disability-assistant

Repository files navigation

♿ Universal Accessibility Companion

A powerful, multimodal AI assistant designed to empower individuals with vision and hearing impairments. Built with Gradio, Modal, Google Gemini, ElevenLabs, and Hugging Face models.

🌟 Features

👁️ Vision Assistant (For Visually Impaired)

  • Scene Description: Detects objects and describes the scene using OwlViT.
  • Smart OCR: Reads text from documents, signs, and screens using Google Gemini 2.5 Flash.
  • Text Simplification: Summarizes complex text into easy-to-understand language.
  • Text-to-Speech (TTS): Reads out the simplified text using ElevenLabs (Bella voice).
  • Haptic Feedback: Maps detected sounds (e.g., "car horn", "siren") to vibration patterns (simulated).

🗣️ Speech Impaired Assistant (For Hearing/Speech Impaired)

  • Live Captioning: Real-time transcription of speech using Distil-Whisper.
  • Emotion Detection: Identifies the speaker's emotion (e.g., "Happy", "Sad", "Angry") using DistilHuBERT.
  • Speaker Diarization: Identifies who is speaking (e.g., "SPEAKER_00", "SPEAKER_01") using pyannote.audio.
  • Low Latency: Optimized for real-time interaction with parallel processing.

🔗 Integrations

  • MCP (Model Context Protocol): Connects to external tools like Calendar, Email, and Maps (Mock implementation).

🛠️ Tech Stack

  • Frontend: Gradio (Python)
  • Backend: Modal (Serverless GPU inference)
  • AI Models:
    • Vision: google/gemini-2.5-flash-image, google/owlvit-base-patch32
    • Hearing: distil-whisper/distil-large-v2
    • Emotion: BilalHasan/distilhubert-finetuned-ravdess (ONNX)
    • Diarization: pyannote/speaker-diarization-3.1
    • TTS: ElevenLabs API

🚀 Setup & Installation

1. Clone the Repository

git clone https://github.com/yourusername/accessibility-companion.git
cd accessibility-companion

2. Install Dependencies

It is recommended to use a virtual environment (Conda or venv).

pip install -r requirements.txt

Note: You also need ffmpeg installed on your system.

3. Configure Environment Variables

Create a .env file in the root directory:

GEMINI_API_KEY=your_gemini_key
ELEVENLABS_API_KEY=your_elevenlabs_key
HF_TOKEN=your_huggingface_token

4. Deploy Backend to Modal

You need a Modal account. Authenticate first:

modal setup

Then deploy the backend functions:

modal deploy modal_app.py

5. Run the App

python app.py

Open your browser at http://localhost:7860.

📂 Project Structure

  • app.py: Main Gradio application (Frontend & Orchestration).
  • modal_app.py: Modal backend definitions (GPU inference).
  • utils.py: Helper functions for TTS and text processing.
  • requirements.txt: Python dependencies.

🤝 Contributing

Pull requests are welcome! Please open an issue first to discuss changes.

📄 License

MIT License

About

working on a DA ai agent.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages