Created: 2026-03-12 Updated: 2026-03-13 (v8 — Sprints 1-7 COMPLETE) Status: In Progress Tagline: "Your body is a lab. Start the discovery."
HealthDecoder pivots from a health event logging app to an experiment-centric discovery platform. Users browse a curated experiment catalog, enroll in experiments measured by their wearable data, and receive AI-powered "Magnitude of Impact" analysis. The app also proactively spots patterns in historical data ("Unenrolled Discoveries") and builds a personal "Playbook" of health levers ranked by impact.
- Wearable sync infrastructure (Whoop, Fitbit, Oura, Libre, Dexcom)
- Database tables:
experiments,experiment_metrics,experiment_checkins,experiment_results - Metric registry with vendor-specific extraction (
mobile/src/utils/experiments/metrics.ts) analyze-experimentEdge Function (statistical analysis + AI interpretation)- Auth flow, Supabase backend, push token infrastructure
- All data models, API routes, and sync logic for event logging (hidden, not removed)
- Experiment Catalog — curated library of 4-8 high-impact experiments for v1 (expanded in v2)
- Unenrolled Discovery Engine — AI pattern spotting on historical data ("accidental experiments")
- Magnitude of Impact scoring — replaces success/failure framing
- User Discoveries — formatted insights from completed experiments and pattern detection
- User Playbook — "Your Body's Operating Manual" ranked by impact magnitude
- Community Data — anonymous aggregated stats on experiment cards ("Wisdom of the Lab")
- AI Model Abstraction — Gemini-first with provider-swappable architecture
- Data Learning Pipeline — normalized experiment outcomes feeding recommendation intelligence
- Data Quality Monitoring — wearable sync health scoring and gap detection
- New Navigation — Discover / My Lab / Playbook / Profile (retire Home, History, Insights tabs)
- Home tab (event logging via voice/text/camera)
- History tab (event timeline)
- Insights tab (glucose charts, analytics)
- Create-experiment screen (replaced by catalog enrollment)
Before building the catalog or any AI prompts, establish the Compliance Dictionary. Every engineer, content writer, and AI prompt must use this reference. If clinical terms leak into Sprint 1 seed data, fixing them later is a costly refactor.
| Banned Term | Approved Replacement | Context |
|---|---|---|
| diagnose / diagnosis | identify / observe | Never imply clinical diagnosis |
| treat / treatment | experiment / protocol | We run experiments, not treatments |
| cure | improve / support | No curative claims |
| prevent / prevention | associated with lower / support | No prevention claims |
| disease | — (omit entirely) | Never reference diseases |
| diabetes | blood sugar wellness | If glucose context needed |
| hypertension | heart rate patterns | If BP context needed |
| cardiovascular disease | heart wellness | Never name diseases |
| insulin resistance | glucose response | Correlational framing |
| insulin sensitivity | glucose response efficiency | Correlational framing |
| A1C / HbA1c | long-term glucose patterns | Do not reference clinical biomarkers |
| blood pressure | — (omit unless from device) | Not a wearable metric we track |
| prescribe / prescription | suggest / recommend trying | We are not prescribers |
| dose / dosage | amount / serving | For supplement experiments |
| therapeutic | wellness-focused | No therapeutic claims |
| clinical | — (omit) | We are not clinical |
| patient | user / participant | Users, not patients |
| symptom | experience / observation | Observational framing |
| risk factor | pattern associated with | Correlational only |
| mortality | longevity / lifespan | Only in evidence citations |
| success / failure | magnitude of impact | Core framing rule |
Every experiment card, AI output, and discovery must include appropriate framing:
| Context | Required Phrase |
|---|---|
| All AI outputs | "For informational purposes only. Not medical advice." |
| Supplement experiments | "Consult your healthcare provider before starting any supplement." |
| All discovery results | "associated with" or "correlated with" (never "caused by") |
| Experiment framing | "lifestyle experiment" (never "intervention" or "treatment") |
| App-wide footer | "For general wellness purposes only. Not intended to diagnose, treat, cure, or prevent any disease." |
- Draft all 50 catalog entries using approved terminology
- Run automated scan for banned terms before seed data is committed
- Build a lint/validation function:
validateWellnessCompliance(text: string): { pass: boolean, violations: string[] } - This function is used in:
- Catalog seed data validation (CI check)
- AI output post-processing (runtime scan before display)
- Experiment description editing (admin tool, future)
CREATE TABLE experiment_catalog (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
slug TEXT UNIQUE NOT NULL,
name TEXT NOT NULL,
category TEXT NOT NULL,
subcategory TEXT,
protocol_summary TEXT NOT NULL, -- 1-2 sentence card description
protocol_detail TEXT NOT NULL, -- Full protocol with instructions
goal TEXT NOT NULL,
why_it_works TEXT NOT NULL,
difficulty TEXT NOT NULL CHECK (difficulty IN ('easy', 'moderate', 'hard')),
default_duration_days INTEGER NOT NULL,
min_duration_days INTEGER NOT NULL DEFAULT 7,
primary_metrics JSONB NOT NULL, -- [{metric_key, metric_label, unit, data_source, higherIsBetter}]
secondary_metrics JSONB NOT NULL DEFAULT '[]',
required_data_sources TEXT[] NOT NULL, -- which provider types needed
confounders TEXT[] NOT NULL DEFAULT '{}',
adherence_detection TEXT NOT NULL DEFAULT 'manual'
CHECK (adherence_detection IN ('auto', 'semi_auto', 'manual')),
-- auto: fully detectable from wearable data (bedtime, steps, activity frequency)
-- semi_auto: partially detectable, confirm with one-tap (walking + vest, workout type)
-- manual: requires user check-in (supplements, food habits, breathing exercises)
auto_detect_config JSONB, -- rules for auto/semi_auto detection (see section 2.4)
evidence_summary TEXT,
evidence_url TEXT,
starter_pack BOOLEAN DEFAULT false, -- true = included in new-user starter pack candidates
starter_pack_priority INTEGER, -- lower = higher priority within starter pack
tags TEXT[] DEFAULT '{}',
is_active BOOLEAN DEFAULT true,
sort_order INTEGER DEFAULT 0,
created_at TIMESTAMPTZ DEFAULT now(),
updated_at TIMESTAMPTZ DEFAULT now()
);
CREATE INDEX idx_experiment_catalog_category ON experiment_catalog(category);
CREATE INDEX idx_experiment_catalog_active ON experiment_catalog(is_active) WHERE is_active = true;
CREATE INDEX idx_experiment_catalog_starter ON experiment_catalog(starter_pack) WHERE starter_pack = true;-- Each completed experiment produces exactly one normalized outcome record.
-- This table is the foundation of the data learning pipeline and community intelligence.
-- It is intentionally denormalized for fast aggregation queries.
CREATE TABLE experiment_outcomes (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
experiment_id UUID NOT NULL REFERENCES experiments(id) ON DELETE CASCADE,
catalog_experiment_id UUID REFERENCES experiment_catalog(id),
user_id UUID NOT NULL REFERENCES auth.users(id) ON DELETE CASCADE,
-- User baseline profile (anonymized snapshot at enrollment time)
user_baseline_profile JSONB NOT NULL,
-- {age_bucket: "30-39", rhr_bucket: "60-70", hrv_bucket: "40-50", sleep_bucket: "6-7h",
-- connected_providers: ["oura", "fitbit"], baseline_quality: "good"}
-- Experiment metadata
experiment_category TEXT NOT NULL,
experiment_duration_days INTEGER NOT NULL,
actual_duration_days INTEGER NOT NULL,
-- Adherence
protocol_adherence_pct NUMERIC(5,2) NOT NULL,
valid_days INTEGER NOT NULL,
excluded_days INTEGER NOT NULL DEFAULT 0,
-- Confounders
confounders_present TEXT[] DEFAULT '{}',
concurrent_experiments INTEGER DEFAULT 0,
-- Metric changes (the core data)
metric_changes JSONB NOT NULL,
-- [{metric_key, baseline_mean, baseline_stddev, experiment_mean, change_pct,
-- effect_size_cohens_d, direction, data_points_baseline, data_points_experiment}]
-- Scoring
overall_magnitude TEXT NOT NULL
CHECK (overall_magnitude IN ('high', 'moderate', 'low', 'minimal', 'inconclusive')),
confidence TEXT NOT NULL
CHECK (confidence IN ('strong', 'moderate', 'suggestive')),
-- Attribution
attribution_confidence TEXT NOT NULL
CHECK (attribution_confidence IN ('strong', 'moderate', 'low')),
concurrent_experiment_ids UUID[] DEFAULT '{}',
attribution_map JSONB,
-- [{experiment_id, experiment_name, attribution_plausibility: "high"|"moderate"|"low"}]
-- AI metadata
ai_model TEXT,
ai_prompt_version TEXT,
created_at TIMESTAMPTZ DEFAULT now(),
UNIQUE(experiment_id)
);
CREATE INDEX idx_outcomes_catalog ON experiment_outcomes(catalog_experiment_id);
CREATE INDEX idx_outcomes_category ON experiment_outcomes(experiment_category);
CREATE INDEX idx_outcomes_magnitude ON experiment_outcomes(overall_magnitude);
CREATE INDEX idx_outcomes_user ON experiment_outcomes(user_id);CREATE TABLE community_experiment_stats (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
catalog_experiment_id UUID NOT NULL REFERENCES experiment_catalog(id) ON DELETE CASCADE,
total_participants INTEGER DEFAULT 0,
total_completed INTEGER DEFAULT 0,
avg_impact_by_metric JSONB DEFAULT '{}',
-- {metric_key: {avg_change_pct, median_change_pct, p25, p75}}
pct_high_impact NUMERIC(5,2) DEFAULT 0,
pct_moderate_impact NUMERIC(5,2) DEFAULT 0,
pct_low_impact NUMERIC(5,2) DEFAULT 0,
pct_minimal_impact NUMERIC(5,2) DEFAULT 0,
baseline_segment_stats JSONB DEFAULT '{}',
-- {rhr_60_70: {avg_change_pct: X, count: Y}, hrv_40_50: {...}}
updated_at TIMESTAMPTZ DEFAULT now(),
UNIQUE(catalog_experiment_id)
);CREATE TABLE user_discoveries (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
user_id UUID NOT NULL REFERENCES auth.users(id) ON DELETE CASCADE,
experiment_id UUID REFERENCES experiments(id) ON DELETE SET NULL,
catalog_experiment_id UUID REFERENCES experiment_catalog(id),
discovery_type TEXT NOT NULL
CHECK (discovery_type IN ('experiment_result', 'unenrolled_pattern')),
title TEXT NOT NULL,
summary TEXT NOT NULL,
detailed_analysis TEXT,
metrics_impact JSONB NOT NULL,
-- [{metric_key, metric_label, baseline_value, observed_value, change_pct, magnitude, unit}]
overall_magnitude TEXT
CHECK (overall_magnitude IN ('high', 'moderate', 'low', 'minimal', 'inconclusive')),
confidence TEXT CHECK (confidence IN ('strong', 'moderate', 'suggestive')),
confounders_noted TEXT[],
suggested_experiment_id UUID REFERENCES experiment_catalog(id),
ai_model TEXT,
ai_prompt_version TEXT,
status TEXT DEFAULT 'new'
CHECK (status IN ('new', 'viewed', 'added_to_playbook', 'eliminated', 'dismissed')),
-- 'eliminated' = user acknowledged a minimal/inconclusive result (Success of Elimination)
created_at TIMESTAMPTZ DEFAULT now()
);
CREATE INDEX idx_user_discoveries_user ON user_discoveries(user_id, created_at DESC);
CREATE INDEX idx_user_discoveries_type ON user_discoveries(user_id, discovery_type);CREATE TABLE user_playbook (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
user_id UUID NOT NULL REFERENCES auth.users(id) ON DELETE CASCADE,
discovery_id UUID REFERENCES user_discoveries(id) ON DELETE SET NULL,
catalog_experiment_id UUID REFERENCES experiment_catalog(id),
habit_name TEXT NOT NULL,
impact_category TEXT NOT NULL, -- sleep, hrv, rhr, glucose, recovery, metabolic, functional
magnitude TEXT NOT NULL CHECK (magnitude IN ('high', 'moderate', 'low', 'eliminated')),
-- 'eliminated' = Minimal/Inconclusive result, framed as "ruled out" (Success of Elimination)
impact_description TEXT NOT NULL, -- "HRV +16%, Deep Sleep +22 min" or "Not a lever for your sleep"
rank INTEGER, -- 1 = highest impact lever; eliminated entries ranked last
created_at TIMESTAMPTZ DEFAULT now(),
updated_at TIMESTAMPTZ DEFAULT now()
);
CREATE INDEX idx_user_playbook_user ON user_playbook(user_id, rank);CREATE TABLE device_data_quality (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
user_id UUID NOT NULL REFERENCES auth.users(id) ON DELETE CASCADE,
device_id UUID NOT NULL REFERENCES connected_devices(device_id) ON DELETE CASCADE,
assessment_date DATE NOT NULL,
-- Completeness scoring
data_quality_score NUMERIC(5,2) NOT NULL, -- 0-100
-- 100 = all expected metrics present, no gaps
-- 80+ = minor gaps, usable for experiments
-- 50-79 = significant gaps, experiments may be limited
-- <50 = unreliable, warn user
-- Gap analysis
missing_data_days INTEGER DEFAULT 0, -- days with no data in last 14 days
partial_data_days INTEGER DEFAULT 0, -- days with some but not all expected metrics
total_days_assessed INTEGER NOT NULL,
-- Per-metric availability
metric_availability JSONB NOT NULL,
-- {hrv: {available: true, days_with_data: 12, total_days: 14, quality: "good"},
-- rhr: {available: true, days_with_data: 14, total_days: 14, quality: "excellent"},
-- sleep_stages: {available: false, days_with_data: 0, total_days: 14, quality: "unavailable"}}
-- Sync health
sync_health TEXT NOT NULL CHECK (sync_health IN ('healthy', 'degraded', 'failing', 'stale')),
-- healthy: synced within last 6 hours, <2 missing days in 14
-- degraded: synced within 24h but 2-4 missing days
-- failing: >4 missing days or sync errors
-- stale: no sync in >48 hours
last_successful_sync TIMESTAMPTZ,
sync_error_count_7d INTEGER DEFAULT 0,
created_at TIMESTAMPTZ DEFAULT now(),
UNIQUE(device_id, assessment_date)
);
CREATE INDEX idx_data_quality_user ON device_data_quality(user_id, assessment_date DESC);
CREATE INDEX idx_data_quality_device ON device_data_quality(device_id, assessment_date DESC);ALTER TABLE experiments
ADD COLUMN catalog_experiment_id UUID REFERENCES experiment_catalog(id),
ADD COLUMN baseline_metrics JSONB, -- auto-computed baseline snapshot
ADD COLUMN baseline_quality TEXT, -- 'excellent' | 'good' | 'limited' | 'insufficient'
ADD COLUMN data_quality_at_enrollment JSONB, -- snapshot of device_data_quality at enrollment
ADD COLUMN concurrent_experiment_ids UUID[], -- IDs of experiments that overlapped (populated at completion)
ADD COLUMN attribution_confidence TEXT -- 'strong' | 'moderate' | 'low' (computed at completion)
CHECK (attribution_confidence IN ('strong', 'moderate', 'low')),
ADD COLUMN is_custom BOOLEAN DEFAULT false;ALTER TABLE experiment_checkins
ADD COLUMN confounders JSONB DEFAULT '{}',
-- {"alcohol": true, "illness": false, "travel": false, "intense_workout": true, "poor_sleep": false}
ADD COLUMN auto_detected BOOLEAN DEFAULT false,
-- true if adherence was auto-detected from wearable data (not manual check-in)
ADD COLUMN auto_detect_data JSONB;
-- evidence for auto-detection: {"detected_bedtime": "22:15", "target_bedtime": "22:30", "within_threshold": true}ALTER TABLE experiment_results
ADD COLUMN overall_magnitude TEXT
CHECK (overall_magnitude IN ('high', 'moderate', 'low', 'minimal', 'inconclusive')),
ADD COLUMN ai_model TEXT,
ADD COLUMN ai_prompt_version TEXT;-- experiment_catalog: public read for authenticated users
ALTER TABLE experiment_catalog ENABLE ROW LEVEL SECURITY;
CREATE POLICY catalog_select ON experiment_catalog FOR SELECT TO authenticated USING (true);
CREATE POLICY catalog_service ON experiment_catalog FOR ALL TO service_role USING (true) WITH CHECK (true);
-- experiment_outcomes: user can read own, service_role aggregates
ALTER TABLE experiment_outcomes ENABLE ROW LEVEL SECURITY;
CREATE POLICY outcomes_select ON experiment_outcomes FOR SELECT USING (auth.uid() = user_id);
CREATE POLICY outcomes_service ON experiment_outcomes FOR ALL TO service_role USING (true) WITH CHECK (true);
-- community_experiment_stats: public read for authenticated users
ALTER TABLE community_experiment_stats ENABLE ROW LEVEL SECURITY;
CREATE POLICY community_stats_select ON community_experiment_stats FOR SELECT TO authenticated USING (true);
CREATE POLICY community_stats_service ON community_experiment_stats FOR ALL TO service_role USING (true) WITH CHECK (true);
-- user_discoveries: user owns their discoveries
ALTER TABLE user_discoveries ENABLE ROW LEVEL SECURITY;
CREATE POLICY discoveries_select ON user_discoveries FOR SELECT USING (auth.uid() = user_id);
CREATE POLICY discoveries_insert ON user_discoveries FOR INSERT WITH CHECK (auth.uid() = user_id);
CREATE POLICY discoveries_update ON user_discoveries FOR UPDATE USING (auth.uid() = user_id);
CREATE POLICY discoveries_delete ON user_discoveries FOR DELETE USING (auth.uid() = user_id);
CREATE POLICY discoveries_service ON user_discoveries FOR ALL TO service_role USING (true) WITH CHECK (true);
-- user_playbook: user owns their playbook
ALTER TABLE user_playbook ENABLE ROW LEVEL SECURITY;
CREATE POLICY playbook_select ON user_playbook FOR SELECT USING (auth.uid() = user_id);
CREATE POLICY playbook_insert ON user_playbook FOR INSERT WITH CHECK (auth.uid() = user_id);
CREATE POLICY playbook_update ON user_playbook FOR UPDATE USING (auth.uid() = user_id);
CREATE POLICY playbook_delete ON user_playbook FOR DELETE USING (auth.uid() = user_id);
CREATE POLICY playbook_service ON user_playbook FOR ALL TO service_role USING (true) WITH CHECK (true);
-- device_data_quality: user reads own, service_role writes
ALTER TABLE device_data_quality ENABLE ROW LEVEL SECURITY;
CREATE POLICY data_quality_select ON device_data_quality FOR SELECT USING (auth.uid() = user_id);
CREATE POLICY data_quality_service ON device_data_quality FOR ALL TO service_role USING (true) WITH CHECK (true);For v1, we ship a focused catalog of the highest-signal experiments — the ones most likely to produce a measurable "wow" moment for new users. The full ~50 experiment library is a v2 expansion.
All descriptions MUST pass the Wellness Terminology Audit (Section 1.1) before commit.
v1 Catalog (8 experiments):
| # | Experiment | Category | Duration | Adherence | Why v1 |
|---|---|---|---|---|---|
| 1 | Alcohol Elimination | Sleep | 14 days | manual | Highest probability of dramatic, measurable change |
| 2 | Early Bedtime | Sleep | 14 days | auto | High signal, easy, auto-detectable via sleep timestamps |
| 3 | Post-Meal Walk | RHR / Sleep | 14 days | auto | Low friction, auto-detectable, strong multi-metric signal |
| 4 | Caffeine Curfew | Sleep | 14 days | manual | High signal for sleep metrics, relatable protocol |
| 5 | Consistent Wake Time | Sleep | 14 days | auto | Easy, auto-detectable, strong sleep consistency signal |
| 6 | Magnesium Before Bed | RHR / Sleep | 14 days | manual | Accessible supplement, strong RHR + sleep signal |
| 7 | Morning Sunlight | Sleep | 10 days | semi_auto | Easy, well-known (Huberman audience), moderate signal |
| 8 | Digital Sunset | Sleep | 14 days | manual | Moderate signal, highly relatable, no equipment needed |
Why these 8: They target the metrics most users have (sleep, RHR, HRV), require no special equipment or CGM, have strong published evidence, and include a mix of auto/semi_auto/manual adherence. 5 of 8 target sleep — deliberately, because sleep is the metric with the most consistent measurable signal from wearables and is universally relevant.
v2 Catalog Expansion (~50 experiments):
| Category | v2 Additions |
|---|---|
| Glucose | ACV, Food Sequencing, Cinnamon, Paired Carb Rule, Resistance Training |
| HRV | Resonant Breathing, Cold Exposure, Nasal Walking, Movement Snacks |
| RHR | Legs Up Wall, Zone 2, Hydration Load, Sauna |
| Metabolic | 30g Protein Breakfast, 8PM Curfew, 3-Hour Buffer, TRE 10h, Mediterranean Trial, UPF Elimination, etc. |
| Body Composition | Protein Pacing, Weighted Vest Walking, Creatine |
| VO2 Max | Norwegian 4x4, Fasted Zone 2 |
| Recovery | Sauna 3x/wk, Afternoon Nap |
| Behavior | Nature Exposure, No News |
| Exercise | Strength Training 3x/wk, HIIT 2x/wk |
| Functional | Dead Hang Challenge, Floor Sitting |
Adherence detection classification:
| Detection Type | v1 Experiments | How Detected |
|---|---|---|
auto |
Early Bedtime, Consistent Wake Time, Post-Meal Walk | Sleep timestamps, step counts, activity logs from wearable |
semi_auto |
Morning Sunlight | Activity/location detected, one-tap confirm |
manual |
Alcohol Elimination, Caffeine Curfew, Magnesium, Digital Sunset | Cannot be detected from wearable data |
Auto-detect config examples:
// Early Bedtime: auto
{"type": "sleep_start_time", "target": "relative_to_baseline", "offset_minutes": -45, "threshold_minutes": 15}
// Post-Meal Walk: auto (via evening steps spike)
{"type": "activity_after_time", "window_start": "18:00", "window_end": "21:00", "min_duration_minutes": 10, "activity_types": ["walk"]}
// Consistent Wake Time: auto
{"type": "sleep_end_time_variance", "max_variance_minutes": 30}Each catalog entry includes:
primary_metrics: The 2-3 metrics most likely to show impactsecondary_metrics: Additional metrics to trackrequired_data_sources: Which wearable data is neededconfounders: Known confounders to flag during check-insadherence_detection+auto_detect_config: How adherence is trackedevidence_summary+evidence_url: Scientific backing (use wellness-compliant language)
A new Supabase Edge Function that supports multiple AI providers with a single interface.
supabase/functions/ai-engine/
├── index.ts -- Router: /analyze, /spot-patterns, /recommend, /starter-pack
├── providers/
│ ├── types.ts -- Provider interface
│ ├── gemini.ts -- Google Gemini (Gemini 2.0 Flash / Pro)
│ ├── openai.ts -- OpenAI (GPT-4o-mini) — fallback
│ └── factory.ts -- Provider selection based on config
├── engines/
│ ├── experiment-analyst.ts -- Experiment analysis (replaces analyze-experiment)
│ ├── pattern-spotter.ts -- Unenrolled discovery detection
│ ├── recommender.ts -- Experiment recommendations
│ └── starter-pack.ts -- Personalized first-experiment recommendation
├── prompts/
│ ├── shared-guidelines.ts -- FDA compliance rules, wellness language
│ ├── experiment-analysis.ts -- Analysis prompt template
│ ├── pattern-detection.ts -- Pattern spotting prompt template
│ └── recommendation.ts -- Recommendation prompt template
└── compliance/
├── banned-words.ts -- Banned/required terms from Section 1.1
└── output-validator.ts -- Scan AI output for compliance violations
Provider Interface:
interface AIProvider {
name: string;
chat(params: {
systemPrompt: string;
userPrompt: string;
temperature?: number;
maxTokens?: number;
responseFormat?: 'json';
}): Promise<{ content: string; model: string; usage: { input: number; output: number } }>;
}Provider Selection:
- Environment variable
AI_PROVIDER=gemini|openai(default:gemini) - Model-specific env vars:
GEMINI_API_KEY,GEMINI_MODEL,OPENAI_API_KEY,OPENAI_CHAT_MODEL - Fallback chain: if primary provider fails, try secondary
Output Compliance Validation:
Every AI response is passed through output-validator.ts before being stored or displayed:
- Scan for banned terms from the dictionary
- Verify required disclaimers are present
- If violations found: auto-correct where possible, log violation, flag for review
- This is a runtime safety net — the prompts should prevent violations, but validation catches edge cases
The existing METRIC_REGISTRY in mobile/src/utils/experiments/metrics.ts already handles vendor-specific extraction for Whoop, Oura, and Fitbit. Extend it for:
- Apple Health — add
'apple_health'torequiresarrays where applicable - Google Fit — add
'google_fit'torequiresarrays - New metrics (if needed by catalog experiments):
respiratory_rate(Oura, Whoop)sleep_latency(Oura, Whoop)spo2(Oura, Fitbit, Apple Health)
Create a shared utility (used by both mobile and Edge Function):
interface BaselineResult {
metric_key: string;
period_start: string;
period_end: string;
mean: number;
median: number;
std_dev: number;
min: number;
max: number;
typical_range: [number, number]; // mean ± 1 std_dev
data_points: number;
quality: 'excellent' | 'good' | 'limited' | 'insufficient';
// excellent: 14+ days, low variance
// good: 7-13 days
// limited: 3-6 days
// insufficient: <3 days
}Logic:
- Query connected_devices for user's active providers
- Determine available metrics via
getAvailableMetrics() - Look back up to 30 days for historical data
- Compute stats per metric
- Return baseline snapshot + quality assessment
- If insufficient data for a metric, flag it but don't block enrollment
After each sync-all-devices or sync-cgm-devices run, assess data quality:
-
Per-device assessment: For each connected device, check:
- Days with data in last 14 days
- Which metrics are present vs expected for that provider
- Time since last successful sync
- Error count in last 7 days (from
ingestion_log)
-
Score computation (0-100):
- 100: All expected metrics, no gaps, synced within 6 hours
- 80+: Minor gaps (1-2 days), usable for experiments
- 50-79: Significant gaps (3-5 days), experiments may be limited
- <50: Unreliable, warn user before enrollment
-
Upsert to
device_data_qualitytable daily -
User-facing indicators:
- Green badge on Profile > Devices: "Healthy" sync
- Amber badge: "Degraded" — missing recent data
- Red badge: "Needs attention" — failing sync or stale data
- Shown on experiment enrollment if quality is low
- At enrollment: snapshot
device_data_qualityintoexperiments.data_quality_at_enrollment - During experiment: if data quality drops below 50 for >3 consecutive days, notify user
- At analysis: factor data quality into confidence scoring (more missing days = lower confidence)
New Tab Bar:
| Tab | Icon | Route | Purpose |
|---|---|---|---|
| Discover | Compass | (tabs)/discover |
Experiment library, recommendations, unenrolled discoveries |
| My Lab | FlaskConical | (tabs)/lab |
Active experiments, check-ins, progress |
| Playbook | BookOpen | (tabs)/playbook |
Personal discoveries, impact rankings |
| Profile | User | (tabs)/profile |
Settings, devices, account |
Hidden but accessible routes:
(tabs)/home— Event logging (hidden from tab bar, accessible via Profile > "Event Logger")(tabs)/history— Event history (same)(tabs)/insights— Glucose insights (same)
When a user connects their first wearable and has 14+ days of historical data, the app immediately runs two processes:
- Metric Gap Analysis — Score each metric against published wellness ranges
- Unenrolled Discovery Scan — Find patterns in historical data (Section 3.1)
Wellness Reference Ranges (NOT clinical — derived from published wearable population data):
| Metric | Optimal Range | Source |
|---|---|---|
| RHR | 50-65 bpm | General fitness literature |
| HRV (RMSSD) | Age-adjusted: 20s: 50-100ms, 30s: 40-80ms, 40s: 30-60ms, 50+: 20-50ms | Population wearable data |
| Sleep Duration | 7-9 hours | Sleep foundation guidelines |
| Deep Sleep % | 15-25% of total sleep | Sleep stage research |
| REM Sleep % | 20-25% of total sleep | Sleep stage research |
| Steps | 8,000-12,000/day | Activity research |
Gap Scoring Algorithm:
interface MetricGap {
metric_key: string;
user_value: number;
optimal_low: number;
optimal_high: number;
gap_severity: 'within_optimal' | 'slightly_below' | 'below' | 'well_below';
improvement_potential: number; // 0-100, higher = more room to improve
}- Compute user's 14-day average for each available metric
- Compare against age-adjusted wellness ranges
- Score gap severity (how far below optimal)
- Rank metrics by improvement potential
From the 8 starter pack candidates, select 4-8 based on:
- Metric gap targeting (40% weight): Prioritize experiments that target the user's weakest metrics
- Difficulty for first-timers (20% weight): Favor "easy" experiments
- Community impact data (20% weight): Favor experiments with high community impact rates
- Data measurability (20% weight): Only include experiments whose metrics the user can actually track
Presentation — "Your First Experiment" hero:
┌─────────────────────────────────────────┐
│ Based on your data, here's where │
│ you have the most room to improve: │
│ │
│ Resting Heart Rate: 68 bpm │
│ ████████████░░░░░░ slightly above │
│ optimal (50-65 bpm) │
│ │
│ Recommended first experiment: │
│ ┌─────────────────────────────────┐ │
│ │ 🚶 Post-Meal Walk │ │
│ │ 14 days · Easy · Auto-tracked │ │
│ │ │ │
│ │ 78% of participants with a │ │
│ │ similar RHR saw a meaningful │ │
│ │ reduction in resting heart │ │
│ │ rate. │ │
│ │ │ │
│ │ [Start This Experiment] │ │
│ └─────────────────────────────────┘ │
│ │
│ Other recommended experiments: │
│ • Zone 2 Cardio (targets RHR) │
│ • Earlier Bedtime (targets sleep) │
│ • Caffeine Curfew (targets sleep) │
└─────────────────────────────────────────┘
The Discover tab is insight-first, not catalog-first. The most magical moment is: "We noticed something interesting in your data." That must appear before any experiment catalog. The product should feel like a system that understands your body, not a library of health hacks.
Layout (ordered by priority — insights first, catalog last):
-
Insights Hero (ALWAYS first — the magic moment):
- If unenrolled discoveries exist: Full-width card(s) showing AI-detected patterns
- "We noticed something in your data..."
- Pattern description with metric visualization
- "Want to confirm this? Start a 7-day experiment →"
- If no discoveries yet but data is loading: "Analyzing your data... We're looking for patterns."
- If no wearable connected: "Connect a wearable to unlock your first discovery."
- If wearable connected but <14 days data: "We're collecting data. Your first insight is coming soon."
- This section is never empty — it always communicates what's happening
- If unenrolled discoveries exist: Full-width card(s) showing AI-detected patterns
-
Your First Experiment (for new users with no active/completed experiments):
- Personalized recommendation from Metric Gap Analysis (Section 2.2)
- Shows the single best experiment with personalized hook
- "Based on your data, this experiment has the highest likelihood of impact for you."
-
Recommended for You (for returning users with experiment history):
- AI-powered recommendations based on data profile, past experiments, playbook
- Includes "Confirm the Driver" suggestions when attribution is ambiguous (Section 5.3)
- Horizontal scroll of experiment cards
-
Experiment Catalog:
- Full list of available experiments (4-8 in v1)
- Category badges, difficulty, adherence type
- Community data on each card
Each card displays:
- Experiment name + category badge
- Duration (e.g., "14 days")
- Difficulty badge (Easy / Moderate / Hard)
- Adherence type indicator (Auto-tracked / One-tap / Daily check-in)
- Primary metrics icons
- Community data: "84% of 1,200 participants saw a 5%+ increase in HRV"
- Data availability indicator (green check if user has required data, amber warning if not)
Sections:
- Hero: Name, category, difficulty, duration, adherence type
- Protocol: Full description of what to do
- Goal: What we're testing
- Why It Works: Science explanation (plain language, wellness-compliant)
- Metrics Tracked: Primary + secondary with data availability check
- Wisdom of the Lab (Community Data):
- "84% of 1,200 participants saw a 5%+ increase in HRV using this protocol."
- "Users with a similar baseline RHR to yours typically see a 'High' magnitude of impact."
- Evidence: Link to study
- Data Availability Warning (if applicable):
- "This experiment tracks HRV, which requires an Oura or Whoop device. You don't currently have one connected. You can still run this experiment, but impact measurement will be limited."
- [Start Anyway] [Connect a Device]
- Start Experiment CTA
Steps:
-
Baseline Preview: "Based on your last 14 days of data, here's your baseline:"
- Show computed baseline for each primary metric
- Quality indicators (excellent/good/limited/insufficient)
- Data quality score from
device_data_quality - If no historical data: "We'll collect baseline data for 7 days before your experiment starts"
-
Duration Selection: Default from catalog, user can adjust (min:
min_duration_days) -
Adherence Method Explanation:
- If
auto: "We'll automatically track your adherence using your wearable data. No daily check-ins needed." - If
semi_auto: "We'll detect your activity and ask a quick confirmation question." - If
manual: "We'll ask you a simple yes/no question each day. Takes less than 5 seconds."
- If
-
Concurrent Experiment Check:
- If user has active experiments, show them
- "Running multiple experiments simultaneously may make it harder to attribute changes to a specific experiment. We will account for this in the analysis."
-
Confirm & Start:
- Creates
experimentsrow withcatalog_experiment_idFK - Copies
primary_metrics+secondary_metricstoexperiment_metrics - Stores auto-computed baseline in
baseline_metricsJSONB - Snapshots
device_data_qualityintodata_quality_at_enrollment - Sets experiment_start = now (or baseline_end if baseline collection needed)
- Creates
Layout:
-
Active Experiments section:
- Cards showing each active experiment with:
- Progress bar (day X of Y)
- Today's check-in prompt (only for
manualadherence experiments, or whensemi_autoneeds confirmation) - Auto-detected adherence badge for
autoexperiments ("Adherence auto-detected today ✓") - Mid-experiment teaser ("Early signal: HRV trending 8% higher than baseline")
- Tap → Active experiment detail
- Cards showing each active experiment with:
-
Pending Baseline section (if any):
- Experiments waiting for baseline data collection
- Progress toward sufficient data
-
Recently Completed section:
- Experiments awaiting or showing analysis results
- "Discovery Found!" badge for experiments with results
-
Empty State: "Start your first experiment to begin discovering what works for your body."
auto experiments (Early Bedtime, Steps, Zone 2, etc.):
- No manual check-in required. The app auto-detects adherence from wearable data.
- After sync, the app checks
auto_detect_configrules against the day's data. - Creates
experiment_checkinsrow withauto_detected = trueandauto_detect_datacontaining evidence. - User sees: "Day 8 of 14 — Adherence auto-detected ✓ (Bedtime: 10:22 PM, target: before 10:30 PM)"
- If auto-detection can't determine adherence (e.g., missing data), fall back to manual prompt.
- Confounder prompt still appears (briefly): "Anything unusual today?" → toggles for alcohol, illness, etc.
semi_auto experiments (Weighted Vest Walking, Morning Sunlight):
- App detects the activity (e.g., a walk was logged).
- Sends one-tap confirmation push notification: "We detected a 25-minute walk at 7:15 PM. Did you wear the weighted vest? [Yes] [No]"
- Creates check-in with
auto_detected = true+ user confirmation.
manual experiments (Supplements, dietary changes, breathing):
- Traditional check-in: "Did you follow the protocol today?" → [Yes] [Mostly] [No]
- Confounder toggles
- Optional note
- Design principle: <10 seconds. This is not journaling.
Sections:
- Progress: Day X of Y, adherence rate, progress timeline
- Metric Trends: Small charts showing primary metrics over baseline + experiment period
- Mid-Experiment Teasers: Hints about emerging patterns
- Only shown after day 5+ with sufficient data
- "Early signal detected: Your deep sleep is trending 15% higher than your baseline average"
- Check-in History: Calendar view with adherence indicators (auto/manual/missed)
- Data Quality Indicator: Current sync health for relevant devices
- Actions: Pause, Extend, Complete Early, Abandon
When an experiment ends (duration reached or user completes early):
-
Status updates to
completed,experiment_endset -
AI Analysis triggered via
ai-engine/analyze:- Fetches baseline vs experiment period data
- Computes statistical comparisons (existing logic)
- Gemini generates narrative with Magnitude of Impact framing
- Creates
user_discoveriesrow - Creates
experiment_outcomesrow (normalized for data learning pipeline) - Updates
experiment_resultswith magnitude scoring - Generates playbook suggestion
- Generates "What's Next?" recommendations
- Runs compliance validation on AI output
-
Discovery Presentation Screen (
discovery/[id].tsx):
┌─────────────────────────────────────┐
│ Discovery Found! │
│ │
│ Post-Meal Walk │
│ 14-day experiment │
│ │
│ ┌─────────────────────────────┐ │
│ │ Magnitude of Impact: HIGH │ │
│ └─────────────────────────────┘ │
│ │
│ Resting HR -4 bpm (62→58) │
│ ████████████████████░░ -6.5% │
│ │
│ Deep Sleep +18 min (52→70) │
│ ████████████████████░░ +34.6% │
│ │
│ HRV +8 ms (44→52) │
│ ████████████████░░░░░░ +18.2% │
│ │
│ Confidence: Moderate │
│ Attribution: Strong │
│ 12 valid days, 2 excluded │
│ (1 alcohol, 1 illness) │
│ │
│ "Walking after dinner was │
│ associated with meaningful │
│ improvements in your recovery │
│ metrics. Your resting heart rate │
│ and deep sleep showed the │
│ strongest response." │
│ │
│ [Add to Playbook] │
│ │
│ ─── What's Next? ─── │
│ Based on your results: │
│ • Earlier Dinner (builds on this) │
│ • Consistent Wake Time │
│ │
│ For informational purposes only. │
│ Not medical advice. │
└─────────────────────────────────────┘
**When Attribution is Moderate or Low**, the discovery screen additionally shows:
┌─────────────────────────────────────┐ │ Attribution: Moderate │ │ │ │ Possible contributors: │ │ ├── Post-Meal Walk Moderate │ │ └── Magnesium Moderate │ │ │ │ ─── Confirm the Driver ─── │ │ Try pausing magnesium for 7 days │ │ while keeping the walk. │ │ [Start Isolation Experiment] │ └─────────────────────────────────────┘
Key framing rules:
- NEVER "Success" / "Failure"
- ALWAYS "Magnitude of Impact": High / Moderate / Low / Minimal / Inconclusive
- Each metric shows: label, absolute change, baseline→observed, bar chart, percentage
- Confounders are noted transparently
- FDA disclaimer at bottom
Many experiments will produce Minimal or Inconclusive magnitude — effectively 0% impact. If the UX treats this as a letdown, the user feels they wasted 14 days. Instead, frame null results as a valuable discovery: you've eliminated a variable and narrowed the search.
Discovery screen when magnitude is Minimal/Inconclusive:
┌─────────────────────────────────────┐
│ Magnesium Before Bed │
│ 14 days • 12 valid days │
│ │
│ Magnitude of Impact: Minimal │
│ │
│ RHR -0.3 bpm (61→60.7) │
│ ░░░░░░░░░░░░░░░░░░░░░ -0.5% │
│ │
│ Deep Sleep +2 min (48→50) │
│ ░░░░░░░░░░░░░░░░░░░░░ +4.2% │
│ │
│ ─── Discovery ─── │
│ │
│ ✓ You've eliminated a variable. │
│ │
│ "Magnesium doesn't appear to be │
│ a meaningful lever for your │
│ sleep or recovery. That's a │
│ valuable finding — you just │
│ narrowed the search for what │
│ actually works for your body." │
│ │
│ 💰 Estimated savings: ~$30/month │
│ │
│ ─── What's Next? ─── │
│ These experiments target the same │
│ metrics with higher community │
│ impact rates: │
│ • Caffeine Curfew (72% saw impact) │
│ • Earlier Bedtime (68% saw impact) │
│ │
│ For informational purposes only. │
│ Not medical advice. │
└─────────────────────────────────────┘
Framing principles for null results:
- Lead with affirmation: "You've eliminated a variable" — this IS progress
- Reframe the value: "You just narrowed the search for what actually works for your body"
- Show concrete savings (when applicable): supplement cost, time saved, effort redirected
- Immediately pivot to what's next: Recommend experiments with higher community impact rates for the same metrics — the user's momentum should carry forward, not stall
- Playbook entry: Null results are recorded in the playbook as "Eliminated" with a strikethrough-style badge, visually showing progress through the search space
- AI narrative tone: Curious and encouraging, never apologetic. "Your body didn't respond to X" is a finding, not a failure
The Playbook is not just a list — it's a progression system that gives users a clear reason to run more experiments. Each category has a discovery count that fills up, creating a sense of exploration and completeness.
Layout:
-
Header: "Your Body's Operating Manual"
-
Category Progression Cards (the key engagement driver):
┌─────────────────────────────────┐ │ Sleep Playbook 2 / 5 ████░ │ │ Recovery Playbook 1 / 4 ██░░░ │ │ Metabolic Playbook 0 / 3 ░░░░ │ │ HRV Playbook 0 / 2 ░░░░ │ │ RHR Playbook 1 / 3 ██░░░ │ └─────────────────────────────────┘- Each category maps to experiment categories in the catalog
- Denominator = number of experiments available in that category (from catalog)
- Numerator = number of completed experiments with discoveries in that category
- Tap a category → see discoveries for that category + available experiments to fill gaps
- Categories with 0 discoveries show: "Run your first [category] experiment →"
-
Top Health Levers (ranked by magnitude):
- Ranked list of all discovered health levers across categories
- Each entry: rank, habit name, magnitude badge, impact summary, category icon
- Example: "#1 — Post-Dinner Walk | HIGH | RHR -6.5%, Deep Sleep +34.6%"
-
Eliminated Variables section:
- Experiments that produced Minimal/Inconclusive magnitude
- Displayed with
strikethroughstyle and "Eliminated" badge - Shows what was ruled out: "Magnesium — not a lever for your sleep"
- Reinforces progress: "3 eliminated, 2 confirmed — your search is narrowing"
- These count toward category progression (denominator explored, not just successes)
-
Unconfirmed Patterns section:
- Patterns spotted by the Unenrolled Discovery Engine but not yet confirmed via formal experiment
- "Unconfirmed" badge + "Confirm with an experiment →" CTA
-
Empty State: "Your body has stories to tell. Run your first experiment to start building your playbook."
The category/denominator counts are derived from the experiment catalog:
- v1 (8 experiments): Sleep: 5, RHR: 2, Sleep/RHR overlap: 1 → adjust to avoid double-counting
- As catalog expands in v2, denominators grow — users always have more to explore
- Both confirmed levers AND eliminated variables count toward progression — running an experiment always moves you forward
- Numerator display: "3 explored (2 confirmed, 1 eliminated)" to show both types of progress
- When a user completes all experiments in a category: "Category Complete! You've mapped your [category] levers."
This is in v1 and is built in Sprint 2. The AI engine analyzes historical wearable data to find "accidental experiments" — patterns the user didn't intentionally create. This is our competitive advantage: users see a discovery before they even pick an experiment.
-
Trigger: Runs when:
- User first connects a wearable with 14+ days of history (immediate value)
- Weekly cron job for users with active data
- On-demand when user visits Discover tab (if last scan >7 days ago)
-
Data Collection: Edge Function
ai-engine/spot-patternsgathers:- Last 30-90 days of daily_summary, sleep_sessions, glucose_data, activities
- Looks for natural variation in behaviors (walking frequency, sleep timing, activity patterns)
-
Statistical Pre-Filtering (BEFORE AI):
The AI should only see patterns that meet strict statistical thresholds. This prevents hallucinated correlations.
Minimum requirements to surface a pattern:
- 20+ data points in each comparison group (e.g., 20 days with the behavior, 20 without)
- Effect size >10% difference between groups
- Consistency across weeks: The pattern must hold across at least 3 separate weeks (not a one-time cluster)
- Statistical significance: p-value < 0.05 using Mann-Whitney U test (non-parametric, handles non-normal wearable data)
- Not explainable by day-of-week effects: Control for weekend vs weekday patterns
Pre-filter pipeline:
Raw data → Behavioral segmentation → Statistical comparison → Filter by thresholds → AI narrative generationThe statistical engine (not AI) identifies candidate patterns. The AI only generates the user-facing narrative for patterns that pass all filters.
-
Pattern Detection Categories:
- Activity → Recovery: "Days with 8,000+ steps correlate with 15% higher next-night HRV"
- Sleep timing → Sleep quality: "Nights with bedtime before 10:30 PM show 22 min more deep sleep"
- Exercise frequency → RHR: "Weeks with 3+ workouts show 5 bpm lower average RHR"
- Temporal patterns: "Your HRV has been trending upward over the last 3 weeks"
-
AI Narrative Generation (Gemini):
- Only runs on statistically validated patterns
- Generates user-friendly description using wellness-compliant language
- Maps pattern to a catalog experiment that could confirm it
- Must use correlational language only (FDA compliance)
-
Output: Creates
user_discoveriesrows with:discovery_type: 'unenrolled_pattern'suggested_experiment_id: links to catalog experiment that could confirm the pattern- Title: "We noticed that on days you walk 8,000+ steps, your overnight HRV is 15% higher."
- CTA: "Want to turn this into a formal 7-day experiment to confirm it?"
-
Conversion Flow: User taps "Start Experiment" on an unenrolled discovery →
- Pre-fills enrollment with the suggested catalog experiment
- Notes the discovery that inspired it
- Max 3 unenrolled discoveries surfaced at a time
- Don't resurface dismissed discoveries
- Only patterns meeting ALL statistical thresholds (20+ points, >10%, multi-week consistency)
- Don't surface patterns that contradict existing playbook entries
- Rate limit: max 2 new discoveries per week per user
Runs after each completed experiment and periodically.
Inputs:
- User's completed experiments + results (from
experiment_outcomes) - Current playbook entries
- Available metrics (connected devices)
- Current active experiments
- Catalog of available experiments
- User baseline profile (from most recent
experiment_outcomes)
Logic:
- Complementary experiments: If earlier bedtime showed high impact on sleep, recommend Post-Dinner Walk or Caffeine Curfew
- Unexplored categories: If user has only done sleep experiments, suggest HRV or glucose experiments
- High-signal experiments: Prioritize experiments with high community impact rates
- Device-aware: Only recommend experiments the user can actually measure
- Personalized: "Users with a similar baseline RHR to yours typically see a 'High' magnitude of impact from this experiment."
Every completed experiment feeds a pipeline that makes the system smarter over time.
1. Experiment completes
└─→ 2. Normalized outcome record created (experiment_outcomes table)
└─→ 3. Community stats aggregation triggered
└─→ 4. Cohort-level effect sizes recomputed
└─→ 5. Recommendation engine weights updated
└─→ 6. Starter pack priorities recalculated
Step 1-2: Outcome Normalization
When an experiment completes, the analysis engine creates an experiment_outcomes row:
- User baseline profile is bucketed (age range, metric ranges) for anonymous aggregation
- All metric changes are stored with effect sizes
- Adherence, confounders, and concurrent experiments are captured
- This is the atomic unit of the learning pipeline
Step 3: Community Stats Aggregation Runs as a batch job (daily cron or triggered on outcome creation):
-- Example aggregation query
SELECT
catalog_experiment_id,
COUNT(*) as total_completed,
AVG((metric_changes->0->>'change_pct')::numeric) as avg_primary_metric_change,
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY (metric_changes->0->>'change_pct')::numeric) as median_change,
COUNT(*) FILTER (WHERE overall_magnitude = 'high') * 100.0 / COUNT(*) as pct_high_impact
FROM experiment_outcomes
WHERE confidence IN ('strong', 'moderate')
GROUP BY catalog_experiment_id;Step 4: Cohort Effect Estimation Group outcomes by user baseline profile buckets:
- "Users with RHR 60-70 who ran Post-Meal Walk" → average effect
- "Users with HRV 30-40 who ran Alcohol Elimination" → average effect
- Stored in
community_experiment_stats.baseline_segment_stats
Step 5: Recommendation Engine Update The recommender uses cohort-level data to personalize:
- Match user's current baseline to closest cohort
- Weight recommendations by that cohort's historical outcomes
- v1: Simple heuristic matching; v2+: ML-based collaborative filtering
Step 6: Starter Pack Recalculation As community data grows, starter pack priorities may shift:
- If Post-Meal Walk shows consistently higher impact than Alcohol Elimination for RHR-focused users, reorder
- Initially manual review; later automated with guardrails
Until we have sufficient real user data (target: 100+ completed outcomes per experiment):
- Seed
community_experiment_statswith estimates from published research - Mark seeded data with
source: 'research_estimate'in the JSONB - Blend: as real data accumulates, weight shifts from research estimates to actual outcomes
- Transition threshold: when 50+ real outcomes exist for an experiment, deprecate research estimate
Aggregation Job (runs daily via cron or on experiment completion):
- Query
experiment_outcomesgrouped bycatalog_experiment_id - For each catalog experiment:
- Count total participants, total completed
- Compute average change_pct per metric across all users
- Compute magnitude distribution (% high, moderate, low, minimal)
- Segment by baseline ranges (e.g., users with baseline HRV 30-40 vs 40-50 vs 50+)
- Update
community_experiment_statstable - All data is anonymized — no user IDs in the aggregated output
Display on Experiment Cards:
- "84% of 1,200 participants saw a 5%+ increase in HRV using this protocol."
- "Users with a similar baseline RHR to yours typically see a 'High' magnitude of impact."
After day 5 of an active experiment:
- Compare experiment-period-so-far metrics against baseline
- If a primary metric is trending >5% different from baseline, surface a teaser
- "Early signal detected: Your deep sleep is trending 15% higher than your baseline average"
- Updates daily
- Uses simple statistical comparison, not full AI analysis (save that for completion)
- Rename current
(tabs)/experiments.tsx→ incorporate into new(tabs)/lab.tsx - Create new
(tabs)/discover.tsxand(tabs)/playbook.tsx - Modify
(tabs)/_layout.tsx:- New tab order: Discover, My Lab, Playbook, Profile
- Hide: Home, History, Insights (remove from tab bar but keep route files)
- Add "Event Logger" and "Glucose Insights" links in Profile for backward access
Users with existing custom experiments (from old create-experiment flow):
- Keep them in
experimentstable withis_custom = true,catalog_experiment_id = null - Display in My Lab under "Custom Experiments" section
- Can still be completed and analyzed
- Remove create-experiment screen from primary navigation
Either extend existing or redirect to new ai-engine/analyze:
- Add
overall_magnitudecomputation - Switch AI provider to Gemini (with OpenAI fallback)
- Add concurrent experiment awareness
- Generate
user_discoveriesrow on completion - Generate
experiment_outcomesrow (normalized for data learning pipeline) - Generate playbook suggestion
- Use Magnitude of Impact framing in all prompts
- Run compliance validation on all AI output
Apple Health and Google Fit integrations are moved to v2. For v1, users connect Whoop, Fitbit, Oura, Libre, or Dexcom (existing sync infrastructure). See v2 Roadmap section for details.
Users must be able to opt out of having their anonymized experiment outcomes included in community aggregations.
Implementation:
- Add
community_data_opt_inBOOLEAN DEFAULT true touser_profiles - During onboarding (or in Profile > Privacy settings), explain:
"Your experiment results help the community by contributing to anonymous statistics like '78% of participants saw improvement.' No personal data is ever shared — only anonymized, aggregated numbers. You can opt out at any time."
- If opted out:
- Their
experiment_outcomesrows are excluded from community aggregation queries - They can still see community stats (they just don't contribute)
- Opt-out is retroactive: existing outcomes are excluded from next aggregation run
- Their
If we ever plan to use the dataset for published research or share with partners:
Implementation:
- Add
research_consentBOOLEAN DEFAULT false touser_profiles - Separate, explicit consent screen (not bundled with community opt-in):
"Would you like to contribute to health research? If you consent, your fully anonymized experiment data may be used in aggregate research studies. Your identity is never associated with research data. You can withdraw consent at any time."
- Consent must be affirmative (opt-in, not opt-out)
- Consent timestamp and version tracked:
research_consent_at,research_consent_version
Define and display clear data retention rules:
| Data Type | Retention Period | Rationale |
|---|---|---|
| Raw wearable data (daily_summary, sleep_sessions, etc.) | Indefinite (user-controlled) | Users need historical data for baselines and pattern detection |
| Experiment records | Indefinite (user-controlled) | Users need their experiment history |
| Experiment outcomes (anonymized) | Indefinite | Core to community intelligence |
| Check-in data | Indefinite (user-controlled) | Part of experiment record |
| AI analysis outputs | Indefinite (user-controlled) | Part of discovery/playbook |
| Device tokens (OAuth) | Until device disconnected or user deletes account | Required for sync |
| Account deletion | Full deletion within 30 days of request | GDPR/CCPA compliance |
Account deletion must:
- Delete all PII (user_profiles, experiment records, discoveries, playbook)
- Remove user from all community aggregations (re-aggregate without their data)
- Revoke all OAuth tokens
- Delete push tokens
- Provide confirmation
How experiment outcomes are anonymized for community use:
- No PII in aggregated data:
community_experiment_statscontains only counts, averages, and percentiles — no user IDs, no individual records - Baseline profiles are bucketed: Age ranges (20-29, 30-39, etc.), metric ranges (RHR 50-60, 60-70), never exact values
- Minimum aggregation threshold: Community stats only shown when 10+ completed outcomes exist for an experiment (prevents small-group identification)
- No temporal correlation: Aggregated stats are not timestamped to individual users
- Differential privacy (future): For very small cohorts, consider adding noise to aggregated values
Create an in-app "Data & Privacy" section (accessible from Profile):
- How Your Data Is Used: Plain-language explanation of data flow
- Community Data: Explanation of anonymization + opt-out toggle
- Research Consent: Separate consent flow
- Data Retention: What we keep and for how long
- Delete My Data: Account deletion request flow
- Export My Data: Download all personal data (GDPR right of portability)
Primary: Google Gemini 2.0 Flash (fast, cost-effective for most analysis) Upgrade: Gemini 2.0 Pro (for complex pattern detection, recommendations) Fallback: OpenAI GPT-4o-mini (existing infrastructure, proven reliability)
Why Gemini first:
- Competitive pricing for high-volume analysis
- Strong structured output support
- Good at pattern detection in numerical data
- Swappable via provider abstraction if performance doesn't meet needs
Environment Variables:
AI_PROVIDER=gemini
GEMINI_API_KEY=<key>
GEMINI_FLASH_MODEL=gemini-2.0-flash
GEMINI_PRO_MODEL=gemini-2.0-pro
# Fallback
OPENAI_API_KEY=<existing>
OPENAI_CHAT_MODEL=gpt-4o-mini
Users can run multiple experiments simultaneously. Since this makes causality ambiguous, we use an Attribution Confidence Model that is honest about uncertainty and converts ambiguity into follow-up experiment opportunities.
Instead of trying to determine causality, classify how confident the attribution is:
| Situation | Attribution Confidence | Label |
|---|---|---|
| 1 experiment active during period | Strong | "This experiment was the primary variable during this period." |
| 2 experiments active | Moderate | "Multiple experiments were active. Improvements may be associated with more than one habit." |
| 3+ experiments active | Low | "Several experiments were active simultaneously. Individual attribution is uncertain." |
Attribution confidence is surfaced on every discovery:
Post-Meal Walk Experiment
Magnitude of Impact: High
Attribution Confidence: Moderate
Multiple experiments were active during this period.
Improvements may be associated with more than one habit.
When attribution confidence is Moderate or Low, show an Attribution Map — all experiments that were active during the period, ranked by plausibility:
Your recovery improved during this experiment period.
Possible contributors:
├── Post-Dinner Walk Confidence: Moderate
├── Magnesium Before Bed Confidence: Moderate
└── Earlier Bedtime Confidence: Low (started mid-period)
Plausibility ranking factors:
- Temporal overlap: Experiments active for the full period rank higher than those that started mid-way
- Protocol relevance: Experiments whose primary metrics match the improved metrics rank higher
- Adherence: Higher adherence = higher attribution plausibility
When attribution is ambiguous, the system converts uncertainty into the next experiment opportunity:
Your sleep improved during the last 14 days, but multiple habits changed.
Suggested next experiment:
🔬 Confirm the driver
Try pausing magnesium for 7 days while keeping everything else constant.
If your sleep stays improved, the Post-Meal Walk was likely the primary driver.
This creates a natural experiment chain:
- Run multiple experiments → see improvement → ambiguous attribution
- System suggests isolation experiment → user runs it
- Clear attribution → discovery confirmed with strong confidence
Implementation:
- After analysis with Moderate/Low attribution, the AI generates a "confirm the driver" suggestion
- Suggestion stored as a special recommendation type in
user_discoveries - If user accepts, creates a new experiment that is a modified version (e.g., "Magnesium Pause" = keep everything else, remove one variable)
- The follow-up experiment references the parent discovery for context
- Each experiment maintains its own baseline (computed at enrollment time)
- AI analysis prompt includes awareness of ALL concurrent experiments with their protocols
experiment_outcomesrecords all concurrent experiment IDs for pipeline analysis- Add to
experimentstable:concurrent_experiment_ids UUID[]— populated at completion time with IDs of all experiments that overlapped
At check-in (user-reported):
- Alcohol, illness, travel, intense workout, poor sleep, significant stress
Automated (from wearable data):
- Sleep duration outlier (< 4 hours)
- Unusual activity level (>2 std dev from baseline)
- New supplement/medication (from health events, if logged)
Excluded days: Days with reported confounders are flagged and optionally excluded from analysis. AI is told about excluded days and why.
For experiment analysis, a "valid day" must:
- Have check-in data (adherence = yes or mostly, or auto-detected = true)
- Not be flagged with major confounders (illness, travel)
- Have metric data available from wearable
- Minimum 60% valid days required for analysis; otherwise confidence = 'suggestive'
See Section 1.1 for the complete Banned Words / Required Phrases dictionary.
Enforced via:
- Wellness Terminology Audit during Sprint 1 (before any content is written)
validateWellnessCompliance()function used in:- Catalog seed data CI validation
- AI output post-processing (runtime)
- All user-facing text review
- System prompt in all AI calls (shared-guidelines.ts)
- Output validation — scan AI responses for banned terms before displaying
- App-wide disclaimer: "For general wellness purposes only. Not intended to diagnose, treat, cure, or prevent any disease. Consult a healthcare professional before making health decisions."
Experiments involving supplements (Magnesium, Creatine, ACV, Cinnamon):
- Frame as "lifestyle experiments" not "therapeutic interventions"
- Never claim dosing recommendations — use "amount" or "serving"
- Include: "Consult your healthcare provider before starting any supplement"
- Focus results on wearable metrics, not clinical outcomes
- Wellness Terminology Audit: Create banned words dictionary +
validateWellnessCompliance()functionsupabase/functions/_shared/compliance/banned-words.ts(51 banned terms, 5 required phrases)supabase/functions/_shared/compliance/output-validator.ts(validate + auto-correct with case preservation)- 17/17 Deno tests pass including seed data compliance scan
- Migration:
experiment_catalogtable (withadherence_detection,auto_detect_configfields) - Migration:
experiment_outcomestable (normalized outcome records) - Migration:
user_discoveries,user_playbook,community_experiment_statstables - Migration:
device_data_qualitytable - Migration: ALTER
experiments(addcatalog_experiment_id,baseline_metrics,baseline_quality,data_quality_at_enrollment,concurrent_experiment_ids,attribution_confidence,is_custom) - Migration: ALTER
experiment_checkins(addconfounders,auto_detected,auto_detect_data) - Migration: ALTER
experiment_results(addoverall_magnitude,ai_model,ai_prompt_version) - RLS policies for all new tables
- Seed experiment catalog (8 high-impact experiments) — all descriptions pass compliance validation
supabase/migrations/20260312100000_seed_experiment_catalog.sql(8 experiments applied to remote Supabase)
- Update TypeScript types (database.types.ts, experiment types)
mobile/src/types/database.types.ts— 6 new table types + updated existing tablesmobile/src/utils/experiments/types.ts— 20+ new interfaces
- CI check: automated compliance scan on catalog seed data
- Create
ai-engineEdge Function with provider abstractionsupabase/functions/ai-engine/index.ts— Router with CORS, JWT auth, 4 routessupabase/functions/ai-engine/providers/types.ts— AIProvider interfacesupabase/functions/ai-engine/providers/factory.ts— Provider factory (env-var selection)
- Implement Gemini provider
supabase/functions/ai-engine/providers/gemini.ts
- Implement OpenAI fallback provider
supabase/functions/ai-engine/providers/openai.ts
- Compliance module:
banned-words.ts+output-validator.ts(completed in Sprint 1) - Implement pattern-spotter engine (unenrolled discoveries)
supabase/functions/ai-engine/engines/pattern-spotter.ts- Statistical pre-filtering (20+ data points, >10% effect, multi-week consistency, p<0.05)
mobile/src/utils/experiments/mannWhitneyU.ts— Mann-Whitney U test (7 tests)mobile/src/utils/experiments/patternFilters.ts— 5 filter functions (11 tests)
- Behavioral segmentation (steps, sleep timing, activity frequency)
- AI narrative generation for validated patterns
supabase/functions/ai-engine/prompts/pattern-detection.ts
- Anti-spam rules (max 3, dismissed tracking, rate limiting)
mobile/src/utils/experiments/antiSpam.ts(12 tests)
- Implement experiment-analyst engine (extends analyze-experiment logic)
supabase/functions/ai-engine/engines/experiment-analyst.ts- Magnitude of Impact scoring
mobile/src/utils/experiments/magnitudeScoring.ts(32 tests)
-
experiment_outcomesrecord creation - Attribution Confidence computation (strong/moderate/low based on concurrent experiment count)
- Attribution Map generation for moderate/low confidence
- "Confirm the Driver" follow-up experiment suggestions
- Implement recommender engine
supabase/functions/ai-engine/engines/recommender.tssupabase/functions/ai-engine/prompts/recommendation.ts
- Implement starter-pack engine (metric gap analysis + personalized recommendations)
supabase/functions/ai-engine/engines/starter-pack.tsmobile/src/utils/experiments/metricGapAnalysis.ts(41 tests)mobile/src/utils/experiments/starterPackScoring.tsmobile/src/utils/experiments/baselineComputation.ts(13 tests)
- Mobile client:
experimentAIClient.tsmobile/src/utils/experiments/experimentAIClient.ts(13 tests)
- Total: 132 new tests, all passing. Full suite: 2075 tests, 0 regressions.
- New
(tabs)/discover.tsx— insight-first layout (discoveries before catalog)mobile/src/app/(tabs)/discover.tsx— Insights Hero + Personalized/Static Starter Pack + Full Catalog sections
- Insights Hero section (unenrolled discoveries, loading states, empty states)
- Placeholder states: "Connect a Wearable" / "Analyzing Your Data"
- Experiment card component with community data + adherence type indicator
mobile/src/components/Experiments/CatalogExperimentCard.tsx- Shows: name, category, difficulty, duration, adherence type, primary metrics, data availability
- Community data display pending Sprint 2 aggregation pipeline
- Experiment detail screen (
catalog-experiment/[slug].tsx)mobile/src/app/catalog-experiment/[slug].tsx— protocol, goal, why it works, metrics, evidence, confounders
- Metric availability detection per experiment
- Checks user's connected devices against
required_data_sources
- Checks user's connected devices against
- Data availability warnings + data quality indicators
- Warning card with missing source count + "Connect Device" CTA
- Personalized Starter Pack for new users (powered by Sprint 2 starter-pack engine)
mobile/src/hooks/usePersonalizedStarterPack.ts— React Query hook calling AI engine/starter-pack- Discover tab shows "Recommended For You" with hero experiment, personalized reasons, metric gap summary
- Falls back to static "Start Here" section when AI is unavailable or no wearable connected
mobile/__tests__/hooks/usePersonalizedStarterPack.test.ts(10 tests)mobile/__tests__/components/DiscoverPersonalized.test.tsx(10 tests)
- Total: 20 new tests for Sprint 3 personalization. Full suite: 2095 tests, 0 regressions.
- Enrollment flow with auto-baseline computation
mobile/src/app/enroll-experiment/[slug].tsx— 3-step wizard (Protocol Review → Baseline Preview → Confirm & Start)mobile/src/utils/experiments/enrollment.ts—validateEnrollment(),buildEnrollmentPayload(),computeBaselinePeriodDates(),checkConcurrentConflicts()mobile/src/hooks/useEnrollExperiment.ts— orchestrates baseline fetch, validation, and enrollment mutationmobile/src/utils/experiments/__tests__/enrollment.test.ts(24 tests)
- Adherence method explanation in enrollment
mobile/src/components/Experiments/AdherenceMethodExplainer.tsx— explains auto/semi_auto/manual detection methods
- Concurrent experiment handling + attribution warnings
mobile/src/components/Experiments/ConcurrentWarning.tsx— warning card with attribution confidence badge- Reuses
computeAttributionConfidence()frommagnitudeScoring.ts
- New
(tabs)/lab.tsxreplacing old experiments tabmobile/src/app/(tabs)/lab.tsx— active experiments, completed section, adopted habits, empty statemobile/src/app/(tabs)/_layout.tsx— lab tab with FlaskConical icon, old experiments tab hidden viahref: nullmobile/__tests__/components/TabBar.test.tsxupdated for 7 tabs (6 visible + 1 hidden)
- Active experiment cards with progress
mobile/src/components/Experiments/ActiveExperimentLabCard.tsx— progress bar, adherence badge, check-in CTA, teaser snippet
- Adaptive check-in flow (auto / semi_auto / manual based on
adherence_detection)mobile/src/components/Experiments/AdaptiveCheckin.tsx— switches display based on adherence mode- Integrated into
mobile/src/app/experiment/[id].tsx
- Auto-adherence detection logic (check
auto_detect_configagainst daily wearable data)mobile/src/utils/experiments/autoAdherence.ts—evaluateAdherence()dispatcher + 4 evaluators (sleep start, activity after time, wake variance, morning activity)mobile/src/hooks/useAutoAdherence.ts— fetches wearable data, auto-creates checkins forautomodemobile/src/utils/experiments/__tests__/autoAdherence.test.ts(25 tests)
- Semi-auto one-tap confirmation flow
useAutoAdherencehook returns evaluation +confirmCheckinmutation forsemi_automodeAdaptiveCheckincomponent renders one-tap confirm/override UI
- Confounder tracking in check-ins
mobile/src/utils/experiments/confounderCheckin.ts—CONFOUNDER_LABELS,getConfounderOptions(),formatConfounderRecord(),parseConfounderRecord(),countActiveConfounders()mobile/src/components/Experiments/ConfounderCheckboxes.tsx— horizontal-wrap toggle chips with iconsmobile/src/utils/experiments/__tests__/confounderCheckin.test.ts(16 tests)
- Mid-experiment teaser insights
mobile/src/utils/experiments/teaserInsights.ts—computeTeaserInsights(),computeSingleTeaser(),classifyTeaserDirection()mobile/src/hooks/useTeaserInsights.ts— fetches metric data with 6-hour stale timemobile/src/components/Experiments/TeaserInsightsCard.tsx— direction indicators (↑/↓/→) with change percentagesmobile/src/utils/experiments/__tests__/teaserInsights.test.ts(18 tests)
- Experiment completion trigger
mobile/src/utils/experiments/completionDetection.ts—shouldCompleteExperiment(),assessCompletionQuality(),getCompletionAction()mobile/src/hooks/useCompletionCheck.ts— completion readiness +triggerCompletionmutationmobile/src/components/Experiments/CompletionModal.tsx— bottom sheet adapting to action type (complete/extend/low quality)mobile/src/utils/experiments/__tests__/completionDetection.test.ts(19 tests)
- Data quality monitoring integration (warn on degraded sync during experiment)
mobile/src/hooks/useDataQualityMonitor.ts— fetchesdevice_data_quality, surfaces warnings for failing/degraded sync or quality_score < 60- Integrated as banner in
mobile/src/app/experiment/[id].tsx
- Phase 1 (Pure Functions TDD): 5 modules, 102 new tests — all passing
- Phase 2 (Hooks): 5 React Query hooks orchestrating Phase 1 functions
- Phase 3 (UI): 3-step enrollment wizard, My Lab tab, 7 new components, experiment detail integration
- Modified existing files:
_layout.tsx(tab rename),catalog-experiment/[slug].tsx(CTA → enrollment),experiment/[id].tsx(Sprint 4 integrations),TabBar.test.tsx(updated assertions) - Total: 102 new tests for Sprint 4. Full suite: 2198 tests, 0 regressions.
- Discovery presentation screen (Magnitude of Impact + Attribution Confidence)
mobile/src/app/discovery/[id].tsx— full discovery screen with magnitude badge, metric cards, AI summarymobile/src/components/Experiments/MagnitudeBadge.tsx— colored badge per magnitude levelmobile/src/components/Experiments/DiscoveryMetricCard.tsx— metric label + absolute change + baseline→observed
- Attribution Map display for moderate/low confidence discoveries
mobile/src/components/Experiments/AttributionMapCard.tsx— tree-style concurrent experiments + plausibilitymobile/src/utils/experiments/discoveryPresentation.ts—shouldShowAttributionMap(),formatAttributionConfidence()
- "Confirm the Driver" follow-up suggestion on discovery screen
mobile/src/components/Experiments/ConfirmDriverCard.tsx— suggestion + isolation experiment CTA
- "Add to Playbook" flow
mobile/src/utils/experiments/addToPlaybook.ts—buildPlaybookInsert(),determinePlaybookMagnitude(),computeNextRank()mobile/src/hooks/useAddToPlaybook.ts— mutation hook with cache invalidation
-
(tabs)/playbook.tsx— progression system (category progress bars + ranked health levers)mobile/src/utils/experiments/playbookProgression.ts—computeCategoryProgression(),rankHealthLevers(),classifyPlaybookEntries()mobile/src/hooks/usePlaybook.ts— React Query hook computing progression, ranking, classificationmobile/src/components/Experiments/PlaybookCategoryCard.tsx— category progress bar + summarymobile/src/components/Experiments/PlaybookEntryRow.tsx— ranked lever with magnitude badgemobile/src/components/Experiments/EliminatedVariableRow.tsx— strikethrough + "Eliminated" badge
- "What's Next?" recommendations on discovery screen
mobile/src/utils/experiments/whatsNextRecommendation.ts—selectWhatsNextExperiments()scoring enginemobile/src/hooks/useWhatsNext.ts— React Query hook fetching catalog + community statsmobile/src/components/Experiments/WhatsNextCard.tsx— recommendation cards with community impact %
- Playbook empty state
- Discovery + Playbook hooks:
useDiscovery,usePlaybook,useAddToPlaybook,useWhatsNext - Null result framing:
isNullResult(),getNullResultFraming()for minimal/inconclusive outcomes
Total: 73 new pure function tests for Sprint 5. Full suite: 2302 tests, 0 regressions.
- Restructure tab bar: Discover, My Lab, Playbook, Profile
mobile/src/app/(tabs)/_layout.tsx— 4 visible + 4 hidden tabsmobile/__tests__/components/TabBar.test.tsx— 20 tests (updated for new nav)
- Hide event logging tabs (keep routes accessible via Profile)
- Home, History, Insights hidden with
href: null
- Home, History, Insights hidden with
- Update _layout.tsx with new tab order
- Order: Discover → My Lab → Playbook → Profile
- Playbook tab placeholder (Sprint 5 will flesh out)
mobile/src/app/(tabs)/playbook.tsx— empty state with BookOpen icon
- Community data bootstrap (seed from research estimates for 8 v1 experiments)
supabase/migrations/20260312200000_seed_community_stats_bootstrap.sql- Research-sourced impact stats for all 8 v1 catalog experiments
- Data learning pipeline: aggregation utility (experiment_outcomes → community_experiment_stats)
mobile/src/utils/experiments/communityStatsAggregation.ts—aggregateOutcomesToCommunityStats()mobile/__tests__/utils/communityStatsAggregation.test.ts— 11 tests- Computes: distinct participants, metric percentiles (p25/median/p75), impact distribution, baseline segment stratification
- Community stats hook:
useCommunityStatsfor catalog cardsmobile/src/hooks/useCommunityStats.ts— React Query hook with 15-min staleTimemobile/__tests__/hooks/useCommunityStats.test.ts— 8 tests
- Data quality assessment hook:
useDataQualityfor sync health monitoringmobile/src/hooks/useDataQuality.ts— React Query hook withoverallSyncHealthhelpermobile/__tests__/hooks/useDataQuality.test.ts— 9 tests
- Update onboarding hints
mobile/src/hooks/useOnboardingHints.ts— AsyncStorage-backed first-run hint systemmobile/__tests__/hooks/useOnboardingHints.test.ts— 12 tests- Sequential progression: Discover → My Lab → Playbook → First Experiment
- Dismiss all, reset, corrupted data recovery
- Data learning pipeline cron: aggregation edge function + pg_cron
supabase/functions/aggregate-community-stats/index.ts— daily cron (3 AM UTC)- Fetches outcomes, groups by catalog ID, upserts aggregated stats
- Data quality assessment cron: quality scoring edge function + pg_cron
supabase/functions/assess-data-quality/index.ts— hourly cron (:15 past, after sync)- Scores devices 0-100, classifies sync health, tracks metric availability
supabase/migrations/20260313000000_add_data_pipeline_cron_jobs.sql— pg_cron setup for both
- Migrate existing custom experiments to new schema (deferred — no custom experiments in production yet)
Total: 60 new tests for Sprint 6. Full suite: 2396 tests, 0 regressions.
Phase 1 — Pure Functions (TDD):
- Privacy types + validation (
mobile/src/utils/privacy/types.ts,privacyValidation.ts) — deletion request validation, retention days validation, consent change detection, account deletion summary, community exclusion logic (21 tests) - Compliance text constants (
complianceText.ts) — FDA disclaimers, medical disclaimers, experiment disclaimers, AI disclaimers, community data disclaimers, data deletion warnings; context-based disclaimer selector (15 tests) - Community opt-out filtering (
communityOptOut.ts) — filters outcomes by opted-out user IDs, computes opt-out impact on data sufficiency (9 tests) - Attribution model validation — 18-test validation suite for
magnitudeScoring.tswith simulated concurrent experiment scenarios (0-3+ concurrent, overlap/adherence/metric relevance scoring, deterministic ordering, magnitude independence) - Performance benchmarks — 10 benchmark tests ensuring core functions scale linearly (computeOverallMagnitude, computeAttributionMap, computeTeaserInsights, validateEnrollment, filterOutcomes at 10k scale, selectStarterPack at 50 entries, computeBaselineFromValues at 1k points)
- AI response schema validation — 9 contract tests validating ExperimentAnalysisResult, StarterPackResult, RecommendedExperiment shapes match mobile client expectations
Phase 2 — Database Migration:
-
supabase/migrations/20260313000000_privacy_consent_and_account_deletion.sqluser_privacy_settingstable (community_data_opt_in, research_consent, data_retention_days with CHECK >= 30)consent_audit_logtable (immutable audit trail with consent_version, consent_type)account_deletion_requeststable (pending → processing → completed lifecycle)community_data_opt_incolumn onexperiment_outcomesdelete_account(p_user_id)RPC — SECURITY DEFINER, cascades through all user tables, deletes auth.users row- RLS policies and indexes for all new tables
Phase 3 — Hooks:
-
usePrivacySettingshook — React Query fetch + upsert + consent audit logging -
useAccountDeletionhook — validates deletion request, callsdelete_accountRPC, clears SecureStore + Zustand auth state
Phase 4 — UI Screens + Components:
-
MedicalDisclaimercomponent — context-aware disclaimer text (experiment_result, teaser, discovery, ai_recommendation, community_stats), compact mode - Privacy & Data screen (
mobile/src/app/privacy.tsx) — community data toggle, research consent toggle, data retention picker (30/60/90/180/365/Indefinite), delete account button, footer links - Account Deletion screen (
mobile/src/app/delete-account.tsx) — multi-step flow: Summary → Reason (optional) → Type "DELETE" confirmation → Processing → Done - Profile screen updates — wired "Privacy & Security" and "Account Settings" to
/privacy, added MedicalDisclaimer footer - Disclaimer additions — MedicalDisclaimer added to Discovery detail, Playbook tab, Discover tab
Phase 5 — Integration Tests:
- Experiment lifecycle E2E (
mobile/__tests__/integration/experiment/lifecycle.test.ts) — 8 tests: catalog creation, enrollment, check-ins, completion with outcome + discovery, add to playbook, cancellation, concurrent attribution, low adherence (requires local Supabase) - Privacy integration tests (
mobile/__tests__/integration/privacy/account-deletion.test.ts) — 5 tests: privacy settings CRUD, retention constraint enforcement, consent audit logging, account deletion cascade, community opt-out flag on outcomes (requires local Supabase)
Total: 94 new tests for Sprint 7 (82 unit + 12 integration). Full suite: 2396 tests passing, 0 regressions.
- Expand experiment catalog from 8 to ~50 experiments (full CSV)
- Apple Health integration (read-only: HRV, RHR, sleep, steps, workouts, SpO2)
- Google Fit integration (REST API, OAuth flow, sync function)
- New experiment categories: Glucose, Metabolic, Body Composition, VO2 Max, Functional, Exercise
- Category-specific wellness ranges for Metric Gap Analysis
Generate beautiful, shareable images from experiment results and playbook entries.
Experiment Discovery Card:
┌─────────────────────────────────────┐
│ MY BODY EXPERIMENT │
│ │
│ Alcohol Elimination │
│ 10 days │
│ │
│ Deep Sleep: +31% │
│ HRV: +24% │
│ Resting HR: -5 bpm │
│ │
│ Magnitude of Impact: HIGH │
│ │
│ Your body is a lab. │
│ Start the discovery. │
│ Health Decoder │
└─────────────────────────────────────┘
Top Health Levers Card:
┌─────────────────────────────────────┐
│ MY BODY'S TOP HEALTH LEVERS │
│ │
│ 1️⃣ Earlier bedtime │
│ HRV +22% │
│ │
│ 2️⃣ No alcohol │
│ Deep sleep +31% │
│ │
│ 3️⃣ Evening walk │
│ Resting HR −4 bpm │
│ │
│ Your body is a lab. │
│ Start the discovery. │
│ Health Decoder │
└─────────────────────────────────────┘
Implementation:
- Generate card as a rendered React Native view → export to image via
react-native-view-shot - Share via native share sheet (iOS/Android)
- Include app download link / deep link
- Available from: Discovery result screen, Playbook screen
- Watermarked with "Health Decoder" branding + tagline
Design principle: every notification must feel like help, not spam. No generic reminders.
Notification Types:
| Trigger | Notification | Value |
|---|---|---|
| Daily check-in due (manual experiments only) | "Quick check-in: Did you follow the Caffeine Curfew protocol today? [Yes] [Mostly] [No]" | Actionable one-tap |
| Semi-auto activity detected | "We detected a 25-min walk at 7:15 PM. Did you wear the weighted vest? [Yes] [No]" | One-tap confirmation |
| Mid-experiment teaser (day 5+) | "Early signal: Your deep sleep is trending 18% higher than baseline." | Motivation |
| Experiment completion | "Your Alcohol Elimination experiment is complete! Tap to see your discovery." | Excitement |
| Unenrolled discovery found | "We noticed a pattern in your data. Tap to see what we found." | Magic moment |
| Data quality degraded | "Your Oura hasn't synced in 48 hours. This may affect your active experiment." | Helpful warning |
| Playbook milestone | "You've completed 3 Sleep experiments! Your Sleep Playbook is 60% complete." | Progression |
Anti-Annoyance Rules:
- Max 2 notifications per day
- Never send between 10 PM and 7 AM (respect sleep experiments!)
- Group related notifications
- User can mute per-experiment or globally
- If user ignores 3 consecutive notifications, reduce frequency
Extend auto-adherence beyond simple threshold checks:
- Detect specific activity types from wearable data (walk, run, strength training)
- Cross-reference with experiment protocols
- For
semi_autoexperiments, detect the activity and prompt one-tap confirmation - For
autoexperiments, silently verify and mark adherence
Allow users to report how they're subjectively feeling throughout the day. This data becomes a first-class metric in experiment analysis, complementing objective wearable data.
Prompt: A small, non-intrusive floating card that appears at configurable times:
┌─────────────────────────────┐
│ How are you feeling? │
│ │
│ Energy │
│ 😴 😐 🙂 😊 🔥 │
│ │
│ Mood │
│ 😞 😐 🙂 😊 😄 │
│ │
│ Focus │
│ 🌫️ 😐 🙂 😊 🎯 │
│ │
│ Physical Comfort │
│ 😣 😐 🙂 😊 💪 │
│ │
│ [Skip] [Save] │
└─────────────────────────────┘
Key Design Decisions:
- 4 dimensions: Energy, Mood, Focus, Physical Comfort
- 5-point scale per dimension (1-5, displayed as emoji faces for instant comprehension)
- One-tap per dimension: Tap the emoji, done. Entire check-in <5 seconds.
- 3 prompts per day at configurable times:
- Morning (default: 30 min after wake time detected from wearable)
- Afternoon (default: 2 PM)
- Evening (default: 8 PM)
- Not mandatory: Users can skip or dismiss. No guilt mechanics.
- Adaptive timing: If wearable detects wake time, adjust morning prompt accordingly
CREATE TABLE subjective_checkins (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
user_id UUID NOT NULL REFERENCES auth.users(id) ON DELETE CASCADE,
checkin_time TIMESTAMPTZ NOT NULL,
time_of_day TEXT NOT NULL CHECK (time_of_day IN ('morning', 'afternoon', 'evening')),
energy INTEGER NOT NULL CHECK (energy BETWEEN 1 AND 5),
mood INTEGER NOT NULL CHECK (mood BETWEEN 1 AND 5),
focus INTEGER NOT NULL CHECK (focus BETWEEN 1 AND 5),
physical_comfort INTEGER NOT NULL CHECK (physical_comfort BETWEEN 1 AND 5),
created_at TIMESTAMPTZ DEFAULT now()
);
CREATE INDEX idx_subjective_user_time ON subjective_checkins(user_id, checkin_time DESC);Subjective data becomes a metric in experiment analysis:
-
New metric registry entries:
avg_energy— Average daily energy score (source:subjective_checkins)avg_mood— Average daily mood scoreavg_focus— Average daily focus scoreavg_comfort— Average daily physical comfort score
-
Experiment results include subjective data:
Post-Meal Walk Experiment — 14 days Objective Metrics: RHR: -4 bpm (62→58) -6.5% Deep Sleep: +18 min +34.6% Subjective Metrics: Afternoon Energy: +0.8 (3.2→4.0) Evening Mood: +0.5 (3.5→4.0) -
Pattern detection uses subjective data:
- "Your energy is 35% higher on days following 7+ hours of sleep"
- "Your focus score drops 0.8 points on days after alcohol consumption"
- These become unenrolled discoveries
-
Subjective data resolves "objective ambiguity": Sometimes wearable metrics show modest change but subjective improvement is dramatic. The discovery can note: "While your HRV showed a modest 5% improvement, your self-reported energy increased 40% during this experiment."
- Subjective data is deeply personal
- Included in data export (GDPR)
- Excluded from community aggregation by default (user must explicitly opt in)
- Never shared in share cards
- ABAB Experiment Design — Advanced mode for power users to run alternating phases (A=normal, B=intervention, A=normal, B=intervention) for stronger evidence
- Community Experiment Cohorts — Users running the same experiment see anonymized group progress
- Counterfactual Estimation — "What would have happened without this experiment?" using baseline trend projection
- Custom Experiment Creation — Allow users to design their own experiments
- ML-Based Recommendation — Replace heuristic recommender with collaborative filtering as dataset grows
- Data Export (GDPR) — Download all personal data in machine-readable format
- Differential Privacy — Add noise to small-cohort aggregations to prevent re-identification