A face recognition memory assistant for MentraOS. Helps you remember people you've met by recognizing faces and storing context about your interactions.
Visage (n.) — a person's face, with reference to the form or features.
Meet someone → Say "Hey, what's your name?" → 📸 Photo captured (background)
↓
🎤 Records conversation
↓
Say "Nice to meet you" OR 20-second timeout
↓
🤖 Gemini extracts: name, workplace, context, details
↓
🔍 Face detected in photo
↓
🧠 Face → 128D vector
↓
💾 Stored in database (PostgreSQL + pgvector)
↓
See them again → 🎯 Face matched → "That's Sarah! You met at the coffee shop."
- Voice-activated conversation capture — Say "Hey, what's your name?" to start recording
- Intelligent conversation parsing — Gemini extracts names, workplace, and context automatically
- Non-blocking photo capture — Camera runs in the background while you talk
- Farewell detection — Automatically ends recording when you say "nice to meet you" or "catch you later"
- Face detection & recognition — Uses DeepFace with Facenet model
- Vector similarity search — pgvector finds matching faces in milliseconds
- Relationship memory — Tracks names, conversation context, when you met, how many times
- Audio feedback — Confirms when information is saved successfully
| Component | Technology |
|---|---|
| Runtime | Bun |
| Backend | Python / FastAPI |
| Database | PostgreSQL + pgvector |
| Face Detection | DeepFace (Facenet) |
| AI/LLM | Google Gemini 2.5 Flash |
| ORM | SQLAlchemy + Alembic |
| Device SDK | @mentra/sdk |
- Bun v1.3.3+
- Python 3.11+
- PostgreSQL 17+ with pgvector extension
- MentraOS device with camera/audio
- Google Gemini API key (free tier available)
git clone https://github.com/michaelnkr808/visage/tree/main
cd mentra-facescan
# Frontend/SDK
bun install
# Backend
cd backend
python -m venv venv
source venv/bin/activate
pip install -r backend/requirements.txt# Create database
createdb visage_db
# Enable pgvector extension
psql visage_db -c "CREATE EXTENSION IF NOT EXISTS vector;"
# Run migrations
cd backend/app
alembic upgrade headCreate .env in project root:
PACKAGE_NAME=visage
MENTRAOS_API_KEY=your-mentraos-api-key
GEMINI_API_KEY=your-gemini-api-key
BACKEND_URL=http://localhost:8000
PORT=3000Get your Gemini API key at https://aistudio.google.com/app/apikey
Create backend/.env:
DATABASE_URL=postgresql+psycopg2://your-username@localhost:5432/visage_db
FACE_MATCH_THRESHOLD=0.6
FACE_CONFIDENCE_MIN=0.9# Terminal 1: Start backend
cd backend
source venv/bin/activate
uvicorn app.main:app --reload
# Terminal 2: Start frontend
bun run index.ts┌─────────────┐ ┌──────────────────┐ ┌────────────────┐
│ Photo │────▶│ DetectedFace │────▶│ FaceEncoding │
│ │ │ │ │ │
│ - image │ │ - bounding box │ │ - 128D vector │
│ - filename │ │ - cropped face │ │ - model name │
│ - timestamp │ │ - confidence │ └────────────────┘
└─────────────┘ └──────────────────┘
│ │
▼ ▼
┌─────────────┐ ┌──────────────────┐
│ Transcript │ │ PersonInfo │
│ │ │ │
│ - raw_text │ │ - name │
│ - extracted │ │ - context │
│ name │ │ - first_met_at │
└─────────────┘ │ - last_seen_at │
│ - times_met │
└──────────────────┘
| Command | Action |
|---|---|
| "Hey, what's your name?" | Starts recording conversation + captures photo in background |
| "Nice to meet you" / "Nice meeting you" / "Catch you later" | Ends recording early and processes the conversation |
Recording automatically stops after 20 seconds if no farewell phrase is detected.
- User says trigger phrase:
"Hey, what's your name?" - Photo capture starts in background (non-blocking)
- Transcription buffers only final transcriptions (not partial)
- Ends when: farewell phrase detected OR 20-second timeout
- Gemini 2.5 Flash extracts structured data:
{ "name": "John", "workplace": "Apple", "context": "hackathon", "details": "software engineer, working on AI projects" }
- Detection: DeepFace finds faces in the photo and extracts bounding boxes
- Encoding: Each face is converted to a 128-dimensional vector using Facenet
- Storage: Vectors are stored in PostgreSQL using pgvector
- Matching: New faces are compared using L2 distance (Euclidean)
- Threshold: Faces with distance < 0.6 are considered a match
# The magic query
FaceEncoding.encoding.l2_distance(query_encoding) < 0.6- Make sure
.envfile exists in project root (not inbackend/) - Get your API key at https://aistudio.google.com/app/apikey
- Add to
.env:GEMINI_API_KEY=your-key-here
# macOS with Homebrew PostgreSQL
brew install pgvector
brew services restart postgresql@17- Check
backend/.envhas correctDATABASE_URL - Ensure PostgreSQL is running:
brew services list - Verify database exists:
psql -l | grep visage
- Ensure DeepFace models are downloaded (happens on first run)
- Check image quality — needs clear, front-facing faces
- This is expected — the MentraOS emulator doesn't have camera access
- Test on a real MentraOS device with a camera
- For emulator testing, conversation recording still works
- Check MentraOS app has microphone permissions
- Ensure device isn't in mute mode
- Look for
📝 Buffered:logs in console to confirm transcription is working
visage/
├── src/ # TypeScript frontend (MentraOS SDK)
│ ├── index.ts # Main app logic, conversation capture
│ └── config.ts # Environment config loader
├── backend/
│ ├── .env # Backend environment variables
│ └── app/
│ ├── main.py # FastAPI entry point
│ ├── config.py # Backend configuration
│ ├── models/
│ │ └── face_scan.py # SQLAlchemy models (Photo, DetectedFace, etc.)
│ ├── routes/
│ │ └── scan.py # API endpoints (workflow1, workflow2)
│ ├── services/
│ │ ├── database.py # Database helper functions
│ │ └── face_detection.py # DeepFace integration
│ └── alembic/ # Database migrations
│ ├── env.py
│ └── versions/
├── .env # Frontend environment variables
├── package.json
└── README.md
- Voice trigger:
"Hey, what's your name?"starts conversation recording - Conversation buffering (only final transcriptions)
- Farewell phrase detection (
"nice to meet you","catch you later") - 20-second timeout for recording
- Gemini AI extraction of name, workplace, context, details
- Non-blocking photo capture
- Face detection and encoding (DeepFace + Facenet)
- Database storage (PostgreSQL + pgvector)
- Audio feedback when information is saved
- Workflow 2: Face Recognition — Recognize people you've already met
- Support for multiple faces in one photo (group conversations)
- Better error handling for no face detected
- Persistent conversation history
- Web dashboard to view stored people
Register a new person with photo and conversation context.
Request (form-encoded):
image_data: base64-encoded image
name: extracted name
conversation_context: workplace, context, details
Response:
{
"success": true,
"message": "Successfully registered John",
"data": {
"photo_id": 1,
"face_id": 1,
"person_info_id": 1,
"name": "John"
}
}Recognize a person from a photo.
Request (form-encoded):
image_data: base64-encoded image
Response:
{
"success": true,
"recognized": true,
"distance": 0.42,
"person": {
"name": "John",
"conversation_context": "Works at Apple, met at hackathon",
"first_met_at": "2025-01-01T12:00:00",
"last_seen_at": "2025-01-01T12:00:00",
"times_met": 1
}
}MIT