Skip to content

Latest commit

 

History

History
352 lines (250 loc) · 11.8 KB

File metadata and controls

352 lines (250 loc) · 11.8 KB
clariti-transparent-beige

A compassionate memory companion for individuals living with dementia

HackIllinois 2026

Expo React Native FastAPI Python Supabase Modal Google Gemini ElevenLabs

Frequent memory recall can slow the progression of dementia symptoms. Clariti makes that recall effortless, emotional, and safe.

About

Over 55 million people worldwide live with dementia. Research shows that frequent, emotionally positive memory recall can slow cognitive decline and improve quality of life. Yet the existing tools are impersonal and frustrating for the very people they aim to help.

Clariti is a smart memory-sharing and recall application built for individuals experiencing dementia and memory loss. It transforms scattered photos, voice notes, and written memories into a living, searchable memory library that feels like a warm conversation with a trusted friend.


Features

Feature Description
LLM-Augmented Voice Assistant Full voice conversation pipeline: speak a question → STT → semantic RAG retrieval → LLM answer generation → TTS → hear the response. Powered by ElevenLabs + Modal.
Facial Recognition Automatic face detection and identification using InsightFace (RetinaFace + ArcFace). Family members enroll their faces once; Clariti recognizes them in every future photo.
Semantic Memory Search Vector embeddings (via Supabase pgvector) enable natural-language queries across all memories in a group. Ask "When did we go to the beach?" and Clariti finds the right photo.
AI Image Descriptions Google Gemini 2.5 Flash generates rich, contextual descriptions of uploaded photos — combining what the AI sees with what the family described.
Group Memory Sharing Create family groups with join codes. All members contribute memories; the person with dementia can browse and query everything in one unified library.
Accessibility-First Design Large touch targets, high-contrast UI, voice-first interaction model, and automatic navigation back to home — designed for users with cognitive impairments.

RAG Pipeline (Retrieval-Augmented Generation)

Clariti's voice and text Q&A follows a two-stage RAG pipeline:

  1. Semantic Retrieval — User's question is embedded via a Supabase Edge Function, then compared against all memory embeddings in the user's group using cosine similarity (pgvector). The top-matching memory is selected.

  2. LLM Generation — The matched memory's content (user description, AI description, identified people) is assembled into a rich prompt and sent to a model running on Modal (A100-80GB). The LLM generates a warm, second-person response grounded exclusively in the retrieved context.

Tables

profiles

Stores user identity and facial recognition data. Linked to auth.users.

Column Type Description
id uuid (PK, FK → auth.users) User's unique identifier
full_name text Display name
avatar_url text Profile picture URL
bio text User bio
face_embedding vector ArcFace 512-D facial embedding for recognition
created_at timestamptz Account creation time
updated_at timestamptz Last profile update

groups

Family or caregiver groups that share memories.

Column Type Description
id uuid (PK) Group identifier
name text Group display name
join_code text (unique) Shareable code for joining the group
created_at timestamptz Creation time

group_members

Many-to-many relationship between users and groups.

Column Type Description
id uuid (PK) Membership record ID
group_id uuid (FK → groups) Group reference
user_id uuid (FK → auth.users) User reference
created_at timestamptz Join time
role text Member role (e.g., admin, member)

memories

The core content table — each row is a single memory (photo + metadata).

Column Type Description
id uuid (PK) Memory identifier
group_id uuid (FK → groups) Owning group
user_id uuid (FK → auth.users) Uploader
content text Human-written description of the memory
image_url text URL to the stored image
ai_description text Gemini-generated image description
text_embedding vector Semantic embedding for RAG retrieval
users_in_image uuid[] Profile IDs of recognized faces
created_at timestamptz Upload time

Key Database Functions

  • match_profile_face(query_embedding, match_threshold, match_count) — RPC function that performs cosine similarity search against all profiles.face_embedding vectors to identify a face.
  • Supabase Edge Function ragTest — Embeds a text question and performs vector similarity search against memories.text_embedding within a specific group.

Backend API

Voice Endpoints

POST /voice-chat

Full voice conversation turn. Accepts audio, returns audio.

Parameter Type Location Description
audio file multipart Recorded audio (m4a, wav, mp3, etc.)
group_id string form Group to search memories in
user_id string form Current user's profile ID (optional)

Response: audio/mpeg stream with headers X-Transcript and X-Answer.

POST /voice-chat-text

Same pipeline as /voice-chat, but returns JSON for easier frontend parsing.

Response:

{
  "transcript": "Who was I with at the park?",
  "answer": "You were at the park with Avi and Akash...",
  "audio_base64": "<base64-encoded MP3>"
}

Image & Vision Endpoints

POST /analyze-image

Generate an AI description of an image using Google Gemini.

{
  "image_url": "https://...",
  "user_description": "Chad at the beach"
}

POST /enroll-profile-face

Detect and store a user's facial embedding from their profile photo.

{
  "bucket": "profile-image-bucket",
  "path": "avatars/user123.jpg",
  "profile_id": "uuid-of-user"
}

POST /match-memory-faces

Detect all faces in a memory image and match them against enrolled profiles.

{
  "bucket": "memory-images",
  "path": "photos/memory456.jpg",
  "memory_id": "uuid-of-memory"
}

Debugging & Health

POST /test-modal-rag

Test the RAG pipeline with a specific memory ID.

{
  "question": "Who was at this event?",
  "memory_id": "uuid-of-memory",
  "user_id": "uuid-of-current-user"
}

GET /health

Returns server status and current UTC timestamp.


Setup & Installation

Prerequisites

  • Python 3.12+
  • Node.js 18+ and npm
  • Expo CLI (npx expo --version should return ≥ 54.0)
  • Modal account (for GPU inference)
  • Supabase project (with pgvector enabled)

1. Clone the Repository

git clone https://github.com/ObviAvi/Clariti.git
cd Clariti

2. Backend Setup

cd backend

# Create and activate a virtual environment
python3 -m venv .venv
source .venv/bin/activate

# Install dependencies
python -m pip install --upgrade pip
pip install -r requirements.txt

Configure Environment Variables

Create a backend/.env file:

# Google Gemini — Vision LLM for image descriptions
GEMINI_API_KEY=your_gemini_api_key

# ngrok — Public tunnel for mobile device testing (optional)
NGROK_AUTHTOKEN=your_ngrok_authtoken

# Supabase — Database, Auth, and Storage
SUPABASE_URL=your_supabase_project_url
SUPABASE_SERVICE_KEY=your_supabase_service_role_key

# ElevenLabs — Speech-to-Text / Text-to-Speech
ELEVENLABS_API_KEY=your_elevenlabs_api_key
ELEVENLABS_VOICE_ID=your_elevenlabs_voice_id

3. Modal Setup (GPU Infrastructure)

Modal hosts the two GPU-intensive services: facial recognition and LLM inference. No API keys are needed for the models themselves, they're open-source and run directly on Modal's GPUs.

Install & Authenticate

# Install the Modal client (if not already in requirements.txt)
(.venv) pip install modal


modal token new

This opens a browser window to log in. Once authenticated, your token is saved locally at ~/.modal.toml.

Create the Shared Volume

Both Modal apps share a persistent volume for caching model weights (~65 GB for Qwen 2.5 32B, ~1 GB for InsightFace). The volume is created automatically on first deploy, but you can also create it explicitly:

Deploy the Face Processor

modal deploy backend/modal_vision.py

This deploys the clariti-face app with:

  • InsightFace buffalo_l (RetinaFace + ArcFace) baked into the container image
  • An A100-80GB GPU with 1 warm container

On first deploy, the container image is built (~2–3 min).

Deploy the RAG LLM

modal deploy backend/modal_rag_output.py

This deploys the clariti-rag-llm app with:

  • Qwen 2.5 32B Instruct loaded in float16
  • An A100-80GB GPU with 1 warm container
  • Model weights stored in the clariti-model-cache volume

First run note: The very first invocation after deploying clariti-rag-llm will download ~65 GB of model weights from Hugging Face. This takes 5–10 minutes. All subsequent calls (even after container restarts) reuse the cached weights from the persistent volume.

Verify Deployments

# List running Modal apps
modal app list

# You should see:
#   clariti-face        (deployed)
#   clariti-rag-llm     (deployed)

Useful Modal Commands

# Redeploy after code changes
modal deploy backend/modal_vision.py
modal deploy backend/modal_rag_output.py

# Stop running apps (to save costs)
modal app stop clariti-face
modal app stop clariti-rag-llm

# Run a Modal file locally for testing (uses Modal's cloud GPUs)
modal run backend/modal_vision.py
modal run backend/modal_rag_output.py

4. Start the Backend Server

cd backend
python main.py

The server starts on http://localhost:8000. API docs available at http://localhost:8000/docs.

5. Frontend Setup

cd frontend

# Install dependencies
npm install

Configure Environment Variables

Create a frontend/.env file:

# Supabase — Client-side connection (use the anon/public key, NOT the service key)
EXPO_PUBLIC_SUPABASE_URL=your_supabase_project_url
EXPO_PUBLIC_SUPABASE_ANON_KEY=your_supabase_anon_key

# Backend API — FastAPI edge server URL (use ngrok URL for physical devices)
EXPO_PUBLIC_BACKEND_API_URL=your_backend_api_url

Start the Expo Dev Server

npx expo start

Scan the QR code with Expo Go (iOS/Android) or press i for iOS simulator / a for Android emulator.

Clariti — Because every memory matters.