Clariti/README.md at master · ObviAvi/Clariti

A compassionate memory companion for individuals living with dementia

HackIllinois 2026

Frequent memory recall can slow the progression of dementia symptoms. Clariti makes that recall effortless, emotional, and safe.

About

Over 55 million people worldwide live with dementia. Research shows that frequent, emotionally positive memory recall can slow cognitive decline and improve quality of life. Yet the existing tools are impersonal and frustrating for the very people they aim to help.

Clariti is a smart memory-sharing and recall application built for individuals experiencing dementia and memory loss. It transforms scattered photos, voice notes, and written memories into a living, searchable memory library that feels like a warm conversation with a trusted friend.

Features

Feature	Description
LLM-Augmented Voice Assistant	Full voice conversation pipeline: speak a question → STT → semantic RAG retrieval → LLM answer generation → TTS → hear the response. Powered by ElevenLabs + Modal.
Facial Recognition	Automatic face detection and identification using InsightFace (RetinaFace + ArcFace). Family members enroll their faces once; Clariti recognizes them in every future photo.
Semantic Memory Search	Vector embeddings (via Supabase pgvector) enable natural-language queries across all memories in a group. Ask "When did we go to the beach?" and Clariti finds the right photo.
AI Image Descriptions	Google Gemini 2.5 Flash generates rich, contextual descriptions of uploaded photos — combining what the AI sees with what the family described.
Group Memory Sharing	Create family groups with join codes. All members contribute memories; the person with dementia can browse and query everything in one unified library.
Accessibility-First Design	Large touch targets, high-contrast UI, voice-first interaction model, and automatic navigation back to home — designed for users with cognitive impairments.

RAG Pipeline (Retrieval-Augmented Generation)

Clariti's voice and text Q&A follows a two-stage RAG pipeline:

Semantic Retrieval — User's question is embedded via a Supabase Edge Function, then compared against all memory embeddings in the user's group using cosine similarity (pgvector). The top-matching memory is selected.
LLM Generation — The matched memory's content (user description, AI description, identified people) is assembled into a rich prompt and sent to a model running on Modal (A100-80GB). The LLM generates a warm, second-person response grounded exclusively in the retrieved context.

Tables

`profiles`

Stores user identity and facial recognition data. Linked to auth.users.

Column	Type	Description
`id`	`uuid` (PK, FK → auth.users)	User's unique identifier
`full_name`	`text`	Display name
`avatar_url`	`text`	Profile picture URL
`bio`	`text`	User bio
`face_embedding`	`vector`	ArcFace 512-D facial embedding for recognition
`created_at`	`timestamptz`	Account creation time
`updated_at`	`timestamptz`	Last profile update

`groups`

Family or caregiver groups that share memories.

Column	Type	Description
`id`	`uuid` (PK)	Group identifier
`name`	`text`	Group display name
`join_code`	`text` (unique)	Shareable code for joining the group
`created_at`	`timestamptz`	Creation time

`group_members`

Many-to-many relationship between users and groups.

Column	Type	Description
`id`	`uuid` (PK)	Membership record ID
`group_id`	`uuid` (FK → groups)	Group reference
`user_id`	`uuid` (FK → auth.users)	User reference
`created_at`	`timestamptz`	Join time
`role`	`text`	Member role (e.g., `admin`, `member`)

`memories`

The core content table — each row is a single memory (photo + metadata).

Column	Type	Description
`id`	`uuid` (PK)	Memory identifier
`group_id`	`uuid` (FK → groups)	Owning group
`user_id`	`uuid` (FK → auth.users)	Uploader
`content`	`text`	Human-written description of the memory
`image_url`	`text`	URL to the stored image
`ai_description`	`text`	Gemini-generated image description
`text_embedding`	`vector`	Semantic embedding for RAG retrieval
`users_in_image`	`uuid[]`	Profile IDs of recognized faces
`created_at`	`timestamptz`	Upload time

Key Database Functions

match_profile_face(query_embedding, match_threshold, match_count) — RPC function that performs cosine similarity search against all profiles.face_embedding vectors to identify a face.
Supabase Edge Function ragTest — Embeds a text question and performs vector similarity search against memories.text_embedding within a specific group.

Backend API

Voice Endpoints

`POST /voice-chat`

Full voice conversation turn. Accepts audio, returns audio.

Parameter	Type	Location	Description
`audio`	`file`	multipart	Recorded audio (m4a, wav, mp3, etc.)
`group_id`	`string`	form	Group to search memories in
`user_id`	`string`	form	Current user's profile ID (optional)

Response: audio/mpeg stream with headers X-Transcript and X-Answer.

`POST /voice-chat-text`

Same pipeline as /voice-chat, but returns JSON for easier frontend parsing.

Response:

{
  "transcript": "Who was I with at the park?",
  "answer": "You were at the park with Avi and Akash...",
  "audio_base64": "<base64-encoded MP3>"
}

Image & Vision Endpoints

`POST /analyze-image`

Generate an AI description of an image using Google Gemini.

{
  "image_url": "https://...",
  "user_description": "Chad at the beach"
}

`POST /enroll-profile-face`

Detect and store a user's facial embedding from their profile photo.

{
  "bucket": "profile-image-bucket",
  "path": "avatars/user123.jpg",
  "profile_id": "uuid-of-user"
}

`POST /match-memory-faces`

Detect all faces in a memory image and match them against enrolled profiles.

{
  "bucket": "memory-images",
  "path": "photos/memory456.jpg",
  "memory_id": "uuid-of-memory"
}

Debugging & Health

`POST /test-modal-rag`

Test the RAG pipeline with a specific memory ID.

{
  "question": "Who was at this event?",
  "memory_id": "uuid-of-memory",
  "user_id": "uuid-of-current-user"
}

`GET /health`

Returns server status and current UTC timestamp.

Setup & Installation

Prerequisites

Python 3.12+
Node.js 18+ and npm
Expo CLI (npx expo --version should return ≥ 54.0)
Modal account (for GPU inference)
Supabase project (with pgvector enabled)

1. Clone the Repository

git clone https://github.com/ObviAvi/Clariti.git
cd Clariti

2. Backend Setup

cd backend

# Create and activate a virtual environment
python3 -m venv .venv
source .venv/bin/activate

# Install dependencies
python -m pip install --upgrade pip
pip install -r requirements.txt

Configure Environment Variables

Create a backend/.env file:

# Google Gemini — Vision LLM for image descriptions
GEMINI_API_KEY=your_gemini_api_key

# ngrok — Public tunnel for mobile device testing (optional)
NGROK_AUTHTOKEN=your_ngrok_authtoken

# Supabase — Database, Auth, and Storage
SUPABASE_URL=your_supabase_project_url
SUPABASE_SERVICE_KEY=your_supabase_service_role_key

# ElevenLabs — Speech-to-Text / Text-to-Speech
ELEVENLABS_API_KEY=your_elevenlabs_api_key
ELEVENLABS_VOICE_ID=your_elevenlabs_voice_id

3. Modal Setup (GPU Infrastructure)

Modal hosts the two GPU-intensive services: facial recognition and LLM inference. No API keys are needed for the models themselves, they're open-source and run directly on Modal's GPUs.

Install & Authenticate

# Install the Modal client (if not already in requirements.txt)
(.venv) pip install modal


modal token new

This opens a browser window to log in. Once authenticated, your token is saved locally at ~/.modal.toml.

Create the Shared Volume

Both Modal apps share a persistent volume for caching model weights (~65 GB for Qwen 2.5 32B, ~1 GB for InsightFace). The volume is created automatically on first deploy, but you can also create it explicitly:

Deploy the Face Processor

modal deploy backend/modal_vision.py

This deploys the clariti-face app with:

InsightFace buffalo_l (RetinaFace + ArcFace) baked into the container image
An A100-80GB GPU with 1 warm container

On first deploy, the container image is built (~2–3 min).

Deploy the RAG LLM

modal deploy backend/modal_rag_output.py

This deploys the clariti-rag-llm app with:

Qwen 2.5 32B Instruct loaded in float16
An A100-80GB GPU with 1 warm container
Model weights stored in the clariti-model-cache volume

First run note: The very first invocation after deploying clariti-rag-llm will download ~65 GB of model weights from Hugging Face. This takes 5–10 minutes. All subsequent calls (even after container restarts) reuse the cached weights from the persistent volume.

Verify Deployments

# List running Modal apps
modal app list

# You should see:
#   clariti-face        (deployed)
#   clariti-rag-llm     (deployed)

Useful Modal Commands

# Redeploy after code changes
modal deploy backend/modal_vision.py
modal deploy backend/modal_rag_output.py

# Stop running apps (to save costs)
modal app stop clariti-face
modal app stop clariti-rag-llm

# Run a Modal file locally for testing (uses Modal's cloud GPUs)
modal run backend/modal_vision.py
modal run backend/modal_rag_output.py

4. Start the Backend Server

cd backend
python main.py

The server starts on http://localhost:8000. API docs available at http://localhost:8000/docs.

5. Frontend Setup

cd frontend

# Install dependencies
npm install

Configure Environment Variables

Create a frontend/.env file:

# Supabase — Client-side connection (use the anon/public key, NOT the service key)
EXPO_PUBLIC_SUPABASE_URL=your_supabase_project_url
EXPO_PUBLIC_SUPABASE_ANON_KEY=your_supabase_anon_key

# Backend API — FastAPI edge server URL (use ngrok URL for physical devices)
EXPO_PUBLIC_BACKEND_API_URL=your_backend_api_url

Start the Expo Dev Server

npx expo start

Scan the QR code with Expo Go (iOS/Android) or press i for iOS simulator / a for Android emulator.

Clariti — Because every memory matters.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A compassionate memory companion for individuals living with dementia

About

Features

RAG Pipeline (Retrieval-Augmented Generation)

Tables

`profiles`

`groups`

`group_members`

`memories`

Key Database Functions

Backend API

Voice Endpoints

`POST /voice-chat`

`POST /voice-chat-text`

Image & Vision Endpoints

`POST /analyze-image`

`POST /enroll-profile-face`

`POST /match-memory-faces`

Debugging & Health

`POST /test-modal-rag`

`GET /health`

Setup & Installation

Prerequisites

1. Clone the Repository

2. Backend Setup

Configure Environment Variables

3. Modal Setup (GPU Infrastructure)

Install & Authenticate

Create the Shared Volume

Deploy the Face Processor

Deploy the RAG LLM

Verify Deployments

Useful Modal Commands

4. Start the Backend Server

5. Frontend Setup

Configure Environment Variables

Start the Expo Dev Server

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

A compassionate memory companion for individuals living with dementia

About

Features

RAG Pipeline (Retrieval-Augmented Generation)

Tables

profiles

groups

group_members

memories

Key Database Functions

Backend API

Voice Endpoints

POST /voice-chat

POST /voice-chat-text

Image & Vision Endpoints

POST /analyze-image

POST /enroll-profile-face

POST /match-memory-faces

Debugging & Health

POST /test-modal-rag

GET /health

Setup & Installation

Prerequisites

1. Clone the Repository

2. Backend Setup

Configure Environment Variables

3. Modal Setup (GPU Infrastructure)

Install & Authenticate

Create the Shared Volume

Deploy the Face Processor

Deploy the RAG LLM

Verify Deployments

Useful Modal Commands

4. Start the Backend Server

5. Frontend Setup

Configure Environment Variables

Start the Expo Dev Server

`profiles`

`groups`

`group_members`

`memories`

`POST /voice-chat`

`POST /voice-chat-text`

`POST /analyze-image`

`POST /enroll-profile-face`

`POST /match-memory-faces`

`POST /test-modal-rag`

`GET /health`