Skip to content

Latest commit

 

History

History
1884 lines (1559 loc) · 96.5 KB

File metadata and controls

1884 lines (1559 loc) · 96.5 KB

Experiment-Centric App Pivot — Implementation Plan

Created: 2026-03-12 Updated: 2026-03-13 (v8 — Sprints 1-7 COMPLETE) Status: In Progress Tagline: "Your body is a lab. Start the discovery."


Executive Summary

HealthDecoder pivots from a health event logging app to an experiment-centric discovery platform. Users browse a curated experiment catalog, enroll in experiments measured by their wearable data, and receive AI-powered "Magnitude of Impact" analysis. The app also proactively spots patterns in historical data ("Unenrolled Discoveries") and builds a personal "Playbook" of health levers ranked by impact.

What We Keep

  • Wearable sync infrastructure (Whoop, Fitbit, Oura, Libre, Dexcom)
  • Database tables: experiments, experiment_metrics, experiment_checkins, experiment_results
  • Metric registry with vendor-specific extraction (mobile/src/utils/experiments/metrics.ts)
  • analyze-experiment Edge Function (statistical analysis + AI interpretation)
  • Auth flow, Supabase backend, push token infrastructure
  • All data models, API routes, and sync logic for event logging (hidden, not removed)

What We Add

  • Experiment Catalog — curated library of 4-8 high-impact experiments for v1 (expanded in v2)
  • Unenrolled Discovery Engine — AI pattern spotting on historical data ("accidental experiments")
  • Magnitude of Impact scoring — replaces success/failure framing
  • User Discoveries — formatted insights from completed experiments and pattern detection
  • User Playbook — "Your Body's Operating Manual" ranked by impact magnitude
  • Community Data — anonymous aggregated stats on experiment cards ("Wisdom of the Lab")
  • AI Model Abstraction — Gemini-first with provider-swappable architecture
  • Data Learning Pipeline — normalized experiment outcomes feeding recommendation intelligence
  • Data Quality Monitoring — wearable sync health scoring and gap detection
  • New Navigation — Discover / My Lab / Playbook / Profile (retire Home, History, Insights tabs)

What We Retire (Hide, Not Remove)

  • Home tab (event logging via voice/text/camera)
  • History tab (event timeline)
  • Insights tab (glucose charts, analytics)
  • Create-experiment screen (replaced by catalog enrollment)

Phase 1: Foundation (Database + AI + Metrics)

1.1 Wellness Terminology Audit (Sprint 1 — Do First)

Before building the catalog or any AI prompts, establish the Compliance Dictionary. Every engineer, content writer, and AI prompt must use this reference. If clinical terms leak into Sprint 1 seed data, fixing them later is a costly refactor.

Banned Words Dictionary

Banned Term Approved Replacement Context
diagnose / diagnosis identify / observe Never imply clinical diagnosis
treat / treatment experiment / protocol We run experiments, not treatments
cure improve / support No curative claims
prevent / prevention associated with lower / support No prevention claims
disease — (omit entirely) Never reference diseases
diabetes blood sugar wellness If glucose context needed
hypertension heart rate patterns If BP context needed
cardiovascular disease heart wellness Never name diseases
insulin resistance glucose response Correlational framing
insulin sensitivity glucose response efficiency Correlational framing
A1C / HbA1c long-term glucose patterns Do not reference clinical biomarkers
blood pressure — (omit unless from device) Not a wearable metric we track
prescribe / prescription suggest / recommend trying We are not prescribers
dose / dosage amount / serving For supplement experiments
therapeutic wellness-focused No therapeutic claims
clinical — (omit) We are not clinical
patient user / participant Users, not patients
symptom experience / observation Observational framing
risk factor pattern associated with Correlational only
mortality longevity / lifespan Only in evidence citations
success / failure magnitude of impact Core framing rule

Required Phrases

Every experiment card, AI output, and discovery must include appropriate framing:

Context Required Phrase
All AI outputs "For informational purposes only. Not medical advice."
Supplement experiments "Consult your healthcare provider before starting any supplement."
All discovery results "associated with" or "correlated with" (never "caused by")
Experiment framing "lifestyle experiment" (never "intervention" or "treatment")
App-wide footer "For general wellness purposes only. Not intended to diagnose, treat, cure, or prevent any disease."

Audit Process

  1. Draft all 50 catalog entries using approved terminology
  2. Run automated scan for banned terms before seed data is committed
  3. Build a lint/validation function: validateWellnessCompliance(text: string): { pass: boolean, violations: string[] }
  4. This function is used in:
    • Catalog seed data validation (CI check)
    • AI output post-processing (runtime scan before display)
    • Experiment description editing (admin tool, future)

1.2 New Database Tables

experiment_catalog — The Library

CREATE TABLE experiment_catalog (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  slug TEXT UNIQUE NOT NULL,
  name TEXT NOT NULL,
  category TEXT NOT NULL,
  subcategory TEXT,
  protocol_summary TEXT NOT NULL,       -- 1-2 sentence card description
  protocol_detail TEXT NOT NULL,        -- Full protocol with instructions
  goal TEXT NOT NULL,
  why_it_works TEXT NOT NULL,
  difficulty TEXT NOT NULL CHECK (difficulty IN ('easy', 'moderate', 'hard')),
  default_duration_days INTEGER NOT NULL,
  min_duration_days INTEGER NOT NULL DEFAULT 7,
  primary_metrics JSONB NOT NULL,       -- [{metric_key, metric_label, unit, data_source, higherIsBetter}]
  secondary_metrics JSONB NOT NULL DEFAULT '[]',
  required_data_sources TEXT[] NOT NULL, -- which provider types needed
  confounders TEXT[] NOT NULL DEFAULT '{}',
  adherence_detection TEXT NOT NULL DEFAULT 'manual'
    CHECK (adherence_detection IN ('auto', 'semi_auto', 'manual')),
  -- auto: fully detectable from wearable data (bedtime, steps, activity frequency)
  -- semi_auto: partially detectable, confirm with one-tap (walking + vest, workout type)
  -- manual: requires user check-in (supplements, food habits, breathing exercises)
  auto_detect_config JSONB,            -- rules for auto/semi_auto detection (see section 2.4)
  evidence_summary TEXT,
  evidence_url TEXT,
  starter_pack BOOLEAN DEFAULT false,  -- true = included in new-user starter pack candidates
  starter_pack_priority INTEGER,       -- lower = higher priority within starter pack
  tags TEXT[] DEFAULT '{}',
  is_active BOOLEAN DEFAULT true,
  sort_order INTEGER DEFAULT 0,
  created_at TIMESTAMPTZ DEFAULT now(),
  updated_at TIMESTAMPTZ DEFAULT now()
);

CREATE INDEX idx_experiment_catalog_category ON experiment_catalog(category);
CREATE INDEX idx_experiment_catalog_active ON experiment_catalog(is_active) WHERE is_active = true;
CREATE INDEX idx_experiment_catalog_starter ON experiment_catalog(starter_pack) WHERE starter_pack = true;

experiment_outcomes — Normalized Outcome Records (Strategic Asset)

-- Each completed experiment produces exactly one normalized outcome record.
-- This table is the foundation of the data learning pipeline and community intelligence.
-- It is intentionally denormalized for fast aggregation queries.
CREATE TABLE experiment_outcomes (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  experiment_id UUID NOT NULL REFERENCES experiments(id) ON DELETE CASCADE,
  catalog_experiment_id UUID REFERENCES experiment_catalog(id),
  user_id UUID NOT NULL REFERENCES auth.users(id) ON DELETE CASCADE,

  -- User baseline profile (anonymized snapshot at enrollment time)
  user_baseline_profile JSONB NOT NULL,
  -- {age_bucket: "30-39", rhr_bucket: "60-70", hrv_bucket: "40-50", sleep_bucket: "6-7h",
  --  connected_providers: ["oura", "fitbit"], baseline_quality: "good"}

  -- Experiment metadata
  experiment_category TEXT NOT NULL,
  experiment_duration_days INTEGER NOT NULL,
  actual_duration_days INTEGER NOT NULL,

  -- Adherence
  protocol_adherence_pct NUMERIC(5,2) NOT NULL,
  valid_days INTEGER NOT NULL,
  excluded_days INTEGER NOT NULL DEFAULT 0,

  -- Confounders
  confounders_present TEXT[] DEFAULT '{}',
  concurrent_experiments INTEGER DEFAULT 0,

  -- Metric changes (the core data)
  metric_changes JSONB NOT NULL,
  -- [{metric_key, baseline_mean, baseline_stddev, experiment_mean, change_pct,
  --   effect_size_cohens_d, direction, data_points_baseline, data_points_experiment}]

  -- Scoring
  overall_magnitude TEXT NOT NULL
    CHECK (overall_magnitude IN ('high', 'moderate', 'low', 'minimal', 'inconclusive')),
  confidence TEXT NOT NULL
    CHECK (confidence IN ('strong', 'moderate', 'suggestive')),

  -- Attribution
  attribution_confidence TEXT NOT NULL
    CHECK (attribution_confidence IN ('strong', 'moderate', 'low')),
  concurrent_experiment_ids UUID[] DEFAULT '{}',
  attribution_map JSONB,
  -- [{experiment_id, experiment_name, attribution_plausibility: "high"|"moderate"|"low"}]

  -- AI metadata
  ai_model TEXT,
  ai_prompt_version TEXT,

  created_at TIMESTAMPTZ DEFAULT now(),
  UNIQUE(experiment_id)
);

CREATE INDEX idx_outcomes_catalog ON experiment_outcomes(catalog_experiment_id);
CREATE INDEX idx_outcomes_category ON experiment_outcomes(experiment_category);
CREATE INDEX idx_outcomes_magnitude ON experiment_outcomes(overall_magnitude);
CREATE INDEX idx_outcomes_user ON experiment_outcomes(user_id);

community_experiment_stats — Wisdom of the Lab

CREATE TABLE community_experiment_stats (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  catalog_experiment_id UUID NOT NULL REFERENCES experiment_catalog(id) ON DELETE CASCADE,
  total_participants INTEGER DEFAULT 0,
  total_completed INTEGER DEFAULT 0,
  avg_impact_by_metric JSONB DEFAULT '{}',
  -- {metric_key: {avg_change_pct, median_change_pct, p25, p75}}
  pct_high_impact NUMERIC(5,2) DEFAULT 0,
  pct_moderate_impact NUMERIC(5,2) DEFAULT 0,
  pct_low_impact NUMERIC(5,2) DEFAULT 0,
  pct_minimal_impact NUMERIC(5,2) DEFAULT 0,
  baseline_segment_stats JSONB DEFAULT '{}',
  -- {rhr_60_70: {avg_change_pct: X, count: Y}, hrv_40_50: {...}}
  updated_at TIMESTAMPTZ DEFAULT now(),
  UNIQUE(catalog_experiment_id)
);

user_discoveries — Discovery Output

CREATE TABLE user_discoveries (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  user_id UUID NOT NULL REFERENCES auth.users(id) ON DELETE CASCADE,
  experiment_id UUID REFERENCES experiments(id) ON DELETE SET NULL,
  catalog_experiment_id UUID REFERENCES experiment_catalog(id),
  discovery_type TEXT NOT NULL
    CHECK (discovery_type IN ('experiment_result', 'unenrolled_pattern')),
  title TEXT NOT NULL,
  summary TEXT NOT NULL,
  detailed_analysis TEXT,
  metrics_impact JSONB NOT NULL,
  -- [{metric_key, metric_label, baseline_value, observed_value, change_pct, magnitude, unit}]
  overall_magnitude TEXT
    CHECK (overall_magnitude IN ('high', 'moderate', 'low', 'minimal', 'inconclusive')),
  confidence TEXT CHECK (confidence IN ('strong', 'moderate', 'suggestive')),
  confounders_noted TEXT[],
  suggested_experiment_id UUID REFERENCES experiment_catalog(id),
  ai_model TEXT,
  ai_prompt_version TEXT,
  status TEXT DEFAULT 'new'
    CHECK (status IN ('new', 'viewed', 'added_to_playbook', 'eliminated', 'dismissed')),
    -- 'eliminated' = user acknowledged a minimal/inconclusive result (Success of Elimination)
  created_at TIMESTAMPTZ DEFAULT now()
);

CREATE INDEX idx_user_discoveries_user ON user_discoveries(user_id, created_at DESC);
CREATE INDEX idx_user_discoveries_type ON user_discoveries(user_id, discovery_type);

user_playbook — Your Body's Operating Manual

CREATE TABLE user_playbook (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  user_id UUID NOT NULL REFERENCES auth.users(id) ON DELETE CASCADE,
  discovery_id UUID REFERENCES user_discoveries(id) ON DELETE SET NULL,
  catalog_experiment_id UUID REFERENCES experiment_catalog(id),
  habit_name TEXT NOT NULL,
  impact_category TEXT NOT NULL,  -- sleep, hrv, rhr, glucose, recovery, metabolic, functional
  magnitude TEXT NOT NULL CHECK (magnitude IN ('high', 'moderate', 'low', 'eliminated')),
  -- 'eliminated' = Minimal/Inconclusive result, framed as "ruled out" (Success of Elimination)
  impact_description TEXT NOT NULL,  -- "HRV +16%, Deep Sleep +22 min" or "Not a lever for your sleep"
  rank INTEGER,  -- 1 = highest impact lever; eliminated entries ranked last
  created_at TIMESTAMPTZ DEFAULT now(),
  updated_at TIMESTAMPTZ DEFAULT now()
);

CREATE INDEX idx_user_playbook_user ON user_playbook(user_id, rank);

device_data_quality — Wearable Sync Health Monitoring

CREATE TABLE device_data_quality (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  user_id UUID NOT NULL REFERENCES auth.users(id) ON DELETE CASCADE,
  device_id UUID NOT NULL REFERENCES connected_devices(device_id) ON DELETE CASCADE,
  assessment_date DATE NOT NULL,

  -- Completeness scoring
  data_quality_score NUMERIC(5,2) NOT NULL,  -- 0-100
  -- 100 = all expected metrics present, no gaps
  -- 80+ = minor gaps, usable for experiments
  -- 50-79 = significant gaps, experiments may be limited
  -- <50 = unreliable, warn user

  -- Gap analysis
  missing_data_days INTEGER DEFAULT 0,       -- days with no data in last 14 days
  partial_data_days INTEGER DEFAULT 0,       -- days with some but not all expected metrics
  total_days_assessed INTEGER NOT NULL,

  -- Per-metric availability
  metric_availability JSONB NOT NULL,
  -- {hrv: {available: true, days_with_data: 12, total_days: 14, quality: "good"},
  --  rhr: {available: true, days_with_data: 14, total_days: 14, quality: "excellent"},
  --  sleep_stages: {available: false, days_with_data: 0, total_days: 14, quality: "unavailable"}}

  -- Sync health
  sync_health TEXT NOT NULL CHECK (sync_health IN ('healthy', 'degraded', 'failing', 'stale')),
  -- healthy: synced within last 6 hours, <2 missing days in 14
  -- degraded: synced within 24h but 2-4 missing days
  -- failing: >4 missing days or sync errors
  -- stale: no sync in >48 hours

  last_successful_sync TIMESTAMPTZ,
  sync_error_count_7d INTEGER DEFAULT 0,

  created_at TIMESTAMPTZ DEFAULT now(),
  UNIQUE(device_id, assessment_date)
);

CREATE INDEX idx_data_quality_user ON device_data_quality(user_id, assessment_date DESC);
CREATE INDEX idx_data_quality_device ON device_data_quality(device_id, assessment_date DESC);

1.3 Modify Existing Tables

experiments — Add Catalog Reference + Data Quality + Attribution

ALTER TABLE experiments
  ADD COLUMN catalog_experiment_id UUID REFERENCES experiment_catalog(id),
  ADD COLUMN baseline_metrics JSONB,           -- auto-computed baseline snapshot
  ADD COLUMN baseline_quality TEXT,             -- 'excellent' | 'good' | 'limited' | 'insufficient'
  ADD COLUMN data_quality_at_enrollment JSONB,  -- snapshot of device_data_quality at enrollment
  ADD COLUMN concurrent_experiment_ids UUID[],  -- IDs of experiments that overlapped (populated at completion)
  ADD COLUMN attribution_confidence TEXT        -- 'strong' | 'moderate' | 'low' (computed at completion)
    CHECK (attribution_confidence IN ('strong', 'moderate', 'low')),
  ADD COLUMN is_custom BOOLEAN DEFAULT false;

experiment_checkins — Add Confounder Tracking + Auto-Detection

ALTER TABLE experiment_checkins
  ADD COLUMN confounders JSONB DEFAULT '{}',
  -- {"alcohol": true, "illness": false, "travel": false, "intense_workout": true, "poor_sleep": false}
  ADD COLUMN auto_detected BOOLEAN DEFAULT false,
  -- true if adherence was auto-detected from wearable data (not manual check-in)
  ADD COLUMN auto_detect_data JSONB;
  -- evidence for auto-detection: {"detected_bedtime": "22:15", "target_bedtime": "22:30", "within_threshold": true}

experiment_results — Add Magnitude Scoring

ALTER TABLE experiment_results
  ADD COLUMN overall_magnitude TEXT
    CHECK (overall_magnitude IN ('high', 'moderate', 'low', 'minimal', 'inconclusive')),
  ADD COLUMN ai_model TEXT,
  ADD COLUMN ai_prompt_version TEXT;

1.4 RLS Policies

-- experiment_catalog: public read for authenticated users
ALTER TABLE experiment_catalog ENABLE ROW LEVEL SECURITY;
CREATE POLICY catalog_select ON experiment_catalog FOR SELECT TO authenticated USING (true);
CREATE POLICY catalog_service ON experiment_catalog FOR ALL TO service_role USING (true) WITH CHECK (true);

-- experiment_outcomes: user can read own, service_role aggregates
ALTER TABLE experiment_outcomes ENABLE ROW LEVEL SECURITY;
CREATE POLICY outcomes_select ON experiment_outcomes FOR SELECT USING (auth.uid() = user_id);
CREATE POLICY outcomes_service ON experiment_outcomes FOR ALL TO service_role USING (true) WITH CHECK (true);

-- community_experiment_stats: public read for authenticated users
ALTER TABLE community_experiment_stats ENABLE ROW LEVEL SECURITY;
CREATE POLICY community_stats_select ON community_experiment_stats FOR SELECT TO authenticated USING (true);
CREATE POLICY community_stats_service ON community_experiment_stats FOR ALL TO service_role USING (true) WITH CHECK (true);

-- user_discoveries: user owns their discoveries
ALTER TABLE user_discoveries ENABLE ROW LEVEL SECURITY;
CREATE POLICY discoveries_select ON user_discoveries FOR SELECT USING (auth.uid() = user_id);
CREATE POLICY discoveries_insert ON user_discoveries FOR INSERT WITH CHECK (auth.uid() = user_id);
CREATE POLICY discoveries_update ON user_discoveries FOR UPDATE USING (auth.uid() = user_id);
CREATE POLICY discoveries_delete ON user_discoveries FOR DELETE USING (auth.uid() = user_id);
CREATE POLICY discoveries_service ON user_discoveries FOR ALL TO service_role USING (true) WITH CHECK (true);

-- user_playbook: user owns their playbook
ALTER TABLE user_playbook ENABLE ROW LEVEL SECURITY;
CREATE POLICY playbook_select ON user_playbook FOR SELECT USING (auth.uid() = user_id);
CREATE POLICY playbook_insert ON user_playbook FOR INSERT WITH CHECK (auth.uid() = user_id);
CREATE POLICY playbook_update ON user_playbook FOR UPDATE USING (auth.uid() = user_id);
CREATE POLICY playbook_delete ON user_playbook FOR DELETE USING (auth.uid() = user_id);
CREATE POLICY playbook_service ON user_playbook FOR ALL TO service_role USING (true) WITH CHECK (true);

-- device_data_quality: user reads own, service_role writes
ALTER TABLE device_data_quality ENABLE ROW LEVEL SECURITY;
CREATE POLICY data_quality_select ON device_data_quality FOR SELECT USING (auth.uid() = user_id);
CREATE POLICY data_quality_service ON device_data_quality FOR ALL TO service_role USING (true) WITH CHECK (true);

1.5 Seed Data — Experiment Catalog

v1: 4-8 High-Impact Experiments Only

For v1, we ship a focused catalog of the highest-signal experiments — the ones most likely to produce a measurable "wow" moment for new users. The full ~50 experiment library is a v2 expansion.

All descriptions MUST pass the Wellness Terminology Audit (Section 1.1) before commit.

v1 Catalog (8 experiments):

# Experiment Category Duration Adherence Why v1
1 Alcohol Elimination Sleep 14 days manual Highest probability of dramatic, measurable change
2 Early Bedtime Sleep 14 days auto High signal, easy, auto-detectable via sleep timestamps
3 Post-Meal Walk RHR / Sleep 14 days auto Low friction, auto-detectable, strong multi-metric signal
4 Caffeine Curfew Sleep 14 days manual High signal for sleep metrics, relatable protocol
5 Consistent Wake Time Sleep 14 days auto Easy, auto-detectable, strong sleep consistency signal
6 Magnesium Before Bed RHR / Sleep 14 days manual Accessible supplement, strong RHR + sleep signal
7 Morning Sunlight Sleep 10 days semi_auto Easy, well-known (Huberman audience), moderate signal
8 Digital Sunset Sleep 14 days manual Moderate signal, highly relatable, no equipment needed

Why these 8: They target the metrics most users have (sleep, RHR, HRV), require no special equipment or CGM, have strong published evidence, and include a mix of auto/semi_auto/manual adherence. 5 of 8 target sleep — deliberately, because sleep is the metric with the most consistent measurable signal from wearables and is universally relevant.

v2 Catalog Expansion (~50 experiments):

Category v2 Additions
Glucose ACV, Food Sequencing, Cinnamon, Paired Carb Rule, Resistance Training
HRV Resonant Breathing, Cold Exposure, Nasal Walking, Movement Snacks
RHR Legs Up Wall, Zone 2, Hydration Load, Sauna
Metabolic 30g Protein Breakfast, 8PM Curfew, 3-Hour Buffer, TRE 10h, Mediterranean Trial, UPF Elimination, etc.
Body Composition Protein Pacing, Weighted Vest Walking, Creatine
VO2 Max Norwegian 4x4, Fasted Zone 2
Recovery Sauna 3x/wk, Afternoon Nap
Behavior Nature Exposure, No News
Exercise Strength Training 3x/wk, HIIT 2x/wk
Functional Dead Hang Challenge, Floor Sitting

Adherence detection classification:

Detection Type v1 Experiments How Detected
auto Early Bedtime, Consistent Wake Time, Post-Meal Walk Sleep timestamps, step counts, activity logs from wearable
semi_auto Morning Sunlight Activity/location detected, one-tap confirm
manual Alcohol Elimination, Caffeine Curfew, Magnesium, Digital Sunset Cannot be detected from wearable data

Auto-detect config examples:

// Early Bedtime: auto
{"type": "sleep_start_time", "target": "relative_to_baseline", "offset_minutes": -45, "threshold_minutes": 15}

// Post-Meal Walk: auto (via evening steps spike)
{"type": "activity_after_time", "window_start": "18:00", "window_end": "21:00", "min_duration_minutes": 10, "activity_types": ["walk"]}

// Consistent Wake Time: auto
{"type": "sleep_end_time_variance", "max_variance_minutes": 30}

Each catalog entry includes:

  • primary_metrics: The 2-3 metrics most likely to show impact
  • secondary_metrics: Additional metrics to track
  • required_data_sources: Which wearable data is needed
  • confounders: Known confounders to flag during check-ins
  • adherence_detection + auto_detect_config: How adherence is tracked
  • evidence_summary + evidence_url: Scientific backing (use wellness-compliant language)

1.6 AI Model Abstraction Layer

Edge Function: ai-engine

A new Supabase Edge Function that supports multiple AI providers with a single interface.

supabase/functions/ai-engine/
├── index.ts              -- Router: /analyze, /spot-patterns, /recommend, /starter-pack
├── providers/
│   ├── types.ts          -- Provider interface
│   ├── gemini.ts         -- Google Gemini (Gemini 2.0 Flash / Pro)
│   ├── openai.ts         -- OpenAI (GPT-4o-mini) — fallback
│   └── factory.ts        -- Provider selection based on config
├── engines/
│   ├── experiment-analyst.ts    -- Experiment analysis (replaces analyze-experiment)
│   ├── pattern-spotter.ts       -- Unenrolled discovery detection
│   ├── recommender.ts           -- Experiment recommendations
│   └── starter-pack.ts          -- Personalized first-experiment recommendation
├── prompts/
│   ├── shared-guidelines.ts     -- FDA compliance rules, wellness language
│   ├── experiment-analysis.ts   -- Analysis prompt template
│   ├── pattern-detection.ts     -- Pattern spotting prompt template
│   └── recommendation.ts        -- Recommendation prompt template
└── compliance/
    ├── banned-words.ts          -- Banned/required terms from Section 1.1
    └── output-validator.ts      -- Scan AI output for compliance violations

Provider Interface:

interface AIProvider {
  name: string;
  chat(params: {
    systemPrompt: string;
    userPrompt: string;
    temperature?: number;
    maxTokens?: number;
    responseFormat?: 'json';
  }): Promise<{ content: string; model: string; usage: { input: number; output: number } }>;
}

Provider Selection:

  • Environment variable AI_PROVIDER=gemini|openai (default: gemini)
  • Model-specific env vars: GEMINI_API_KEY, GEMINI_MODEL, OPENAI_API_KEY, OPENAI_CHAT_MODEL
  • Fallback chain: if primary provider fails, try secondary

Output Compliance Validation: Every AI response is passed through output-validator.ts before being stored or displayed:

  1. Scan for banned terms from the dictionary
  2. Verify required disclaimers are present
  3. If violations found: auto-correct where possible, log violation, flag for review
  4. This is a runtime safety net — the prompts should prevent violations, but validation catches edge cases

1.7 Metric Normalization — Extend Existing Registry

The existing METRIC_REGISTRY in mobile/src/utils/experiments/metrics.ts already handles vendor-specific extraction for Whoop, Oura, and Fitbit. Extend it for:

  1. Apple Health — add 'apple_health' to requires arrays where applicable
  2. Google Fit — add 'google_fit' to requires arrays
  3. New metrics (if needed by catalog experiments):
    • respiratory_rate (Oura, Whoop)
    • sleep_latency (Oura, Whoop)
    • spo2 (Oura, Fitbit, Apple Health)

1.8 Baseline Auto-Computation

Create a shared utility (used by both mobile and Edge Function):

interface BaselineResult {
  metric_key: string;
  period_start: string;
  period_end: string;
  mean: number;
  median: number;
  std_dev: number;
  min: number;
  max: number;
  typical_range: [number, number];  // mean ± 1 std_dev
  data_points: number;
  quality: 'excellent' | 'good' | 'limited' | 'insufficient';
  // excellent: 14+ days, low variance
  // good: 7-13 days
  // limited: 3-6 days
  // insufficient: <3 days
}

Logic:

  1. Query connected_devices for user's active providers
  2. Determine available metrics via getAvailableMetrics()
  3. Look back up to 30 days for historical data
  4. Compute stats per metric
  5. Return baseline snapshot + quality assessment
  6. If insufficient data for a metric, flag it but don't block enrollment

1.9 Data Quality Monitoring

Sync Health Assessment (runs after each sync cycle)

After each sync-all-devices or sync-cgm-devices run, assess data quality:

  1. Per-device assessment: For each connected device, check:

    • Days with data in last 14 days
    • Which metrics are present vs expected for that provider
    • Time since last successful sync
    • Error count in last 7 days (from ingestion_log)
  2. Score computation (0-100):

    • 100: All expected metrics, no gaps, synced within 6 hours
    • 80+: Minor gaps (1-2 days), usable for experiments
    • 50-79: Significant gaps (3-5 days), experiments may be limited
    • <50: Unreliable, warn user before enrollment
  3. Upsert to device_data_quality table daily

  4. User-facing indicators:

    • Green badge on Profile > Devices: "Healthy" sync
    • Amber badge: "Degraded" — missing recent data
    • Red badge: "Needs attention" — failing sync or stale data
    • Shown on experiment enrollment if quality is low

Impact on Experiments

  • At enrollment: snapshot device_data_quality into experiments.data_quality_at_enrollment
  • During experiment: if data quality drops below 50 for >3 consecutive days, notify user
  • At analysis: factor data quality into confidence scoring (more missing days = lower confidence)

Phase 2: Core Experiment Experience (UI + Flow)

2.1 Navigation Restructure

New Tab Bar:

Tab Icon Route Purpose
Discover Compass (tabs)/discover Experiment library, recommendations, unenrolled discoveries
My Lab FlaskConical (tabs)/lab Active experiments, check-ins, progress
Playbook BookOpen (tabs)/playbook Personal discoveries, impact rankings
Profile User (tabs)/profile Settings, devices, account

Hidden but accessible routes:

  • (tabs)/home — Event logging (hidden from tab bar, accessible via Profile > "Event Logger")
  • (tabs)/history — Event history (same)
  • (tabs)/insights — Glucose insights (same)

2.2 Personalized Starter Pack & Metric Gap Analysis

New User Experience

When a user connects their first wearable and has 14+ days of historical data, the app immediately runs two processes:

  1. Metric Gap Analysis — Score each metric against published wellness ranges
  2. Unenrolled Discovery Scan — Find patterns in historical data (Section 3.1)

Metric Gap Analysis — "Where You Stand"

Wellness Reference Ranges (NOT clinical — derived from published wearable population data):

Metric Optimal Range Source
RHR 50-65 bpm General fitness literature
HRV (RMSSD) Age-adjusted: 20s: 50-100ms, 30s: 40-80ms, 40s: 30-60ms, 50+: 20-50ms Population wearable data
Sleep Duration 7-9 hours Sleep foundation guidelines
Deep Sleep % 15-25% of total sleep Sleep stage research
REM Sleep % 20-25% of total sleep Sleep stage research
Steps 8,000-12,000/day Activity research

Gap Scoring Algorithm:

interface MetricGap {
  metric_key: string;
  user_value: number;
  optimal_low: number;
  optimal_high: number;
  gap_severity: 'within_optimal' | 'slightly_below' | 'below' | 'well_below';
  improvement_potential: number;  // 0-100, higher = more room to improve
}
  1. Compute user's 14-day average for each available metric
  2. Compare against age-adjusted wellness ranges
  3. Score gap severity (how far below optimal)
  4. Rank metrics by improvement potential

Starter Pack Selection

From the 8 starter pack candidates, select 4-8 based on:

  1. Metric gap targeting (40% weight): Prioritize experiments that target the user's weakest metrics
  2. Difficulty for first-timers (20% weight): Favor "easy" experiments
  3. Community impact data (20% weight): Favor experiments with high community impact rates
  4. Data measurability (20% weight): Only include experiments whose metrics the user can actually track

Presentation — "Your First Experiment" hero:

┌─────────────────────────────────────────┐
│  Based on your data, here's where       │
│  you have the most room to improve:     │
│                                         │
│  Resting Heart Rate: 68 bpm             │
│  ████████████░░░░░░  slightly above     │
│                 optimal (50-65 bpm)     │
│                                         │
│  Recommended first experiment:          │
│  ┌─────────────────────────────────┐   │
│  │  🚶 Post-Meal Walk              │   │
│  │  14 days · Easy · Auto-tracked  │   │
│  │                                  │   │
│  │  78% of participants with a     │   │
│  │  similar RHR saw a meaningful   │   │
│  │  reduction in resting heart     │   │
│  │  rate.                          │   │
│  │                                  │   │
│  │  [Start This Experiment]        │   │
│  └─────────────────────────────────┘   │
│                                         │
│  Other recommended experiments:         │
│  • Zone 2 Cardio (targets RHR)         │
│  • Earlier Bedtime (targets sleep)      │
│  • Caffeine Curfew (targets sleep)      │
└─────────────────────────────────────────┘

2.3 Discover Tab — Insight-First Design

The Discover tab is insight-first, not catalog-first. The most magical moment is: "We noticed something interesting in your data." That must appear before any experiment catalog. The product should feel like a system that understands your body, not a library of health hacks.

Main Screen (discover.tsx)

Layout (ordered by priority — insights first, catalog last):

  1. Insights Hero (ALWAYS first — the magic moment):

    • If unenrolled discoveries exist: Full-width card(s) showing AI-detected patterns
      • "We noticed something in your data..."
      • Pattern description with metric visualization
      • "Want to confirm this? Start a 7-day experiment →"
    • If no discoveries yet but data is loading: "Analyzing your data... We're looking for patterns."
    • If no wearable connected: "Connect a wearable to unlock your first discovery."
    • If wearable connected but <14 days data: "We're collecting data. Your first insight is coming soon."
    • This section is never empty — it always communicates what's happening
  2. Your First Experiment (for new users with no active/completed experiments):

    • Personalized recommendation from Metric Gap Analysis (Section 2.2)
    • Shows the single best experiment with personalized hook
    • "Based on your data, this experiment has the highest likelihood of impact for you."
  3. Recommended for You (for returning users with experiment history):

    • AI-powered recommendations based on data profile, past experiments, playbook
    • Includes "Confirm the Driver" suggestions when attribution is ambiguous (Section 5.3)
    • Horizontal scroll of experiment cards
  4. Experiment Catalog:

    • Full list of available experiments (4-8 in v1)
    • Category badges, difficulty, adherence type
    • Community data on each card

Experiment Card Component

Each card displays:

  • Experiment name + category badge
  • Duration (e.g., "14 days")
  • Difficulty badge (Easy / Moderate / Hard)
  • Adherence type indicator (Auto-tracked / One-tap / Daily check-in)
  • Primary metrics icons
  • Community data: "84% of 1,200 participants saw a 5%+ increase in HRV"
  • Data availability indicator (green check if user has required data, amber warning if not)

Experiment Detail Screen (experiment/[slug].tsx)

Sections:

  1. Hero: Name, category, difficulty, duration, adherence type
  2. Protocol: Full description of what to do
  3. Goal: What we're testing
  4. Why It Works: Science explanation (plain language, wellness-compliant)
  5. Metrics Tracked: Primary + secondary with data availability check
  6. Wisdom of the Lab (Community Data):
    • "84% of 1,200 participants saw a 5%+ increase in HRV using this protocol."
    • "Users with a similar baseline RHR to yours typically see a 'High' magnitude of impact."
  7. Evidence: Link to study
  8. Data Availability Warning (if applicable):
    • "This experiment tracks HRV, which requires an Oura or Whoop device. You don't currently have one connected. You can still run this experiment, but impact measurement will be limited."
    • [Start Anyway] [Connect a Device]
  9. Start Experiment CTA

2.4 Enrollment Flow

Steps:

  1. Baseline Preview: "Based on your last 14 days of data, here's your baseline:"

    • Show computed baseline for each primary metric
    • Quality indicators (excellent/good/limited/insufficient)
    • Data quality score from device_data_quality
    • If no historical data: "We'll collect baseline data for 7 days before your experiment starts"
  2. Duration Selection: Default from catalog, user can adjust (min: min_duration_days)

  3. Adherence Method Explanation:

    • If auto: "We'll automatically track your adherence using your wearable data. No daily check-ins needed."
    • If semi_auto: "We'll detect your activity and ask a quick confirmation question."
    • If manual: "We'll ask you a simple yes/no question each day. Takes less than 5 seconds."
  4. Concurrent Experiment Check:

    • If user has active experiments, show them
    • "Running multiple experiments simultaneously may make it harder to attribute changes to a specific experiment. We will account for this in the analysis."
  5. Confirm & Start:

    • Creates experiments row with catalog_experiment_id FK
    • Copies primary_metrics + secondary_metrics to experiment_metrics
    • Stores auto-computed baseline in baseline_metrics JSONB
    • Snapshots device_data_quality into data_quality_at_enrollment
    • Sets experiment_start = now (or baseline_end if baseline collection needed)

2.5 My Lab Tab

Main Screen (lab.tsx)

Layout:

  1. Active Experiments section:

    • Cards showing each active experiment with:
      • Progress bar (day X of Y)
      • Today's check-in prompt (only for manual adherence experiments, or when semi_auto needs confirmation)
      • Auto-detected adherence badge for auto experiments ("Adherence auto-detected today ✓")
      • Mid-experiment teaser ("Early signal: HRV trending 8% higher than baseline")
    • Tap → Active experiment detail
  2. Pending Baseline section (if any):

    • Experiments waiting for baseline data collection
    • Progress toward sufficient data
  3. Recently Completed section:

    • Experiments awaiting or showing analysis results
    • "Discovery Found!" badge for experiments with results
  4. Empty State: "Start your first experiment to begin discovering what works for your body."

Daily Check-in Flow — Adaptive by Adherence Type

auto experiments (Early Bedtime, Steps, Zone 2, etc.):

  • No manual check-in required. The app auto-detects adherence from wearable data.
  • After sync, the app checks auto_detect_config rules against the day's data.
  • Creates experiment_checkins row with auto_detected = true and auto_detect_data containing evidence.
  • User sees: "Day 8 of 14 — Adherence auto-detected ✓ (Bedtime: 10:22 PM, target: before 10:30 PM)"
  • If auto-detection can't determine adherence (e.g., missing data), fall back to manual prompt.
  • Confounder prompt still appears (briefly): "Anything unusual today?" → toggles for alcohol, illness, etc.

semi_auto experiments (Weighted Vest Walking, Morning Sunlight):

  • App detects the activity (e.g., a walk was logged).
  • Sends one-tap confirmation push notification: "We detected a 25-minute walk at 7:15 PM. Did you wear the weighted vest? [Yes] [No]"
  • Creates check-in with auto_detected = true + user confirmation.

manual experiments (Supplements, dietary changes, breathing):

  • Traditional check-in: "Did you follow the protocol today?" → [Yes] [Mostly] [No]
  • Confounder toggles
  • Optional note
  • Design principle: <10 seconds. This is not journaling.

Active Experiment Detail (experiment/active/[id].tsx)

Sections:

  1. Progress: Day X of Y, adherence rate, progress timeline
  2. Metric Trends: Small charts showing primary metrics over baseline + experiment period
  3. Mid-Experiment Teasers: Hints about emerging patterns
    • Only shown after day 5+ with sufficient data
    • "Early signal detected: Your deep sleep is trending 15% higher than your baseline average"
  4. Check-in History: Calendar view with adherence indicators (auto/manual/missed)
  5. Data Quality Indicator: Current sync health for relevant devices
  6. Actions: Pause, Extend, Complete Early, Abandon

2.6 Experiment Completion & Discovery

When an experiment ends (duration reached or user completes early):

  1. Status updates to completed, experiment_end set

  2. AI Analysis triggered via ai-engine/analyze:

    • Fetches baseline vs experiment period data
    • Computes statistical comparisons (existing logic)
    • Gemini generates narrative with Magnitude of Impact framing
    • Creates user_discoveries row
    • Creates experiment_outcomes row (normalized for data learning pipeline)
    • Updates experiment_results with magnitude scoring
    • Generates playbook suggestion
    • Generates "What's Next?" recommendations
    • Runs compliance validation on AI output
  3. Discovery Presentation Screen (discovery/[id].tsx):

┌─────────────────────────────────────┐
│         Discovery Found!            │
│                                     │
│    Post-Meal Walk                   │
│    14-day experiment                │
│                                     │
│  ┌─────────────────────────────┐   │
│  │ Magnitude of Impact: HIGH   │   │
│  └─────────────────────────────┘   │
│                                     │
│  Resting HR    -4 bpm  (62→58)     │
│  ████████████████████░░  -6.5%     │
│                                     │
│  Deep Sleep    +18 min (52→70)     │
│  ████████████████████░░  +34.6%    │
│                                     │
│  HRV           +8 ms  (44→52)     │
│  ████████████████░░░░░░  +18.2%    │
│                                     │
│  Confidence: Moderate               │
│  Attribution: Strong                │
│  12 valid days, 2 excluded          │
│  (1 alcohol, 1 illness)             │
│                                     │
│  "Walking after dinner was          │
│   associated with meaningful        │
│   improvements in your recovery     │
│   metrics. Your resting heart rate  │
│   and deep sleep showed the         │
│   strongest response."              │
│                                     │
│  [Add to Playbook]                  │
│                                     │
│  ─── What's Next? ───              │
│  Based on your results:             │
│  • Earlier Dinner (builds on this)  │
│  • Consistent Wake Time             │
│                                     │
│  For informational purposes only.   │
│  Not medical advice.                │
└─────────────────────────────────────┘

**When Attribution is Moderate or Low**, the discovery screen additionally shows:

┌─────────────────────────────────────┐ │ Attribution: Moderate │ │ │ │ Possible contributors: │ │ ├── Post-Meal Walk Moderate │ │ └── Magnesium Moderate │ │ │ │ ─── Confirm the Driver ─── │ │ Try pausing magnesium for 7 days │ │ while keeping the walk. │ │ [Start Isolation Experiment] │ └─────────────────────────────────────┘

Key framing rules:

  • NEVER "Success" / "Failure"
  • ALWAYS "Magnitude of Impact": High / Moderate / Low / Minimal / Inconclusive
  • Each metric shows: label, absolute change, baseline→observed, bar chart, percentage
  • Confounders are noted transparently
  • FDA disclaimer at bottom

Null Results — "The Success of Elimination"

Many experiments will produce Minimal or Inconclusive magnitude — effectively 0% impact. If the UX treats this as a letdown, the user feels they wasted 14 days. Instead, frame null results as a valuable discovery: you've eliminated a variable and narrowed the search.

Discovery screen when magnitude is Minimal/Inconclusive:

┌─────────────────────────────────────┐
│  Magnesium Before Bed               │
│  14 days • 12 valid days            │
│                                     │
│  Magnitude of Impact: Minimal       │
│                                     │
│  RHR           -0.3 bpm (61→60.7)  │
│  ░░░░░░░░░░░░░░░░░░░░░  -0.5%     │
│                                     │
│  Deep Sleep    +2 min (48→50)      │
│  ░░░░░░░░░░░░░░░░░░░░░  +4.2%     │
│                                     │
│  ─── Discovery ───                  │
│                                     │
│  ✓ You've eliminated a variable.    │
│                                     │
│  "Magnesium doesn't appear to be    │
│   a meaningful lever for your       │
│   sleep or recovery. That's a       │
│   valuable finding — you just       │
│   narrowed the search for what      │
│   actually works for your body."    │
│                                     │
│  💰 Estimated savings: ~$30/month   │
│                                     │
│  ─── What's Next? ───              │
│  These experiments target the same  │
│  metrics with higher community      │
│  impact rates:                      │
│  • Caffeine Curfew (72% saw impact) │
│  • Earlier Bedtime (68% saw impact) │
│                                     │
│  For informational purposes only.   │
│  Not medical advice.                │
└─────────────────────────────────────┘

Framing principles for null results:

  • Lead with affirmation: "You've eliminated a variable" — this IS progress
  • Reframe the value: "You just narrowed the search for what actually works for your body"
  • Show concrete savings (when applicable): supplement cost, time saved, effort redirected
  • Immediately pivot to what's next: Recommend experiments with higher community impact rates for the same metrics — the user's momentum should carry forward, not stall
  • Playbook entry: Null results are recorded in the playbook as "Eliminated" with a strikethrough-style badge, visually showing progress through the search space
  • AI narrative tone: Curious and encouraging, never apologetic. "Your body didn't respond to X" is a finding, not a failure

2.7 Playbook Tab — Progression System

The Playbook is not just a list — it's a progression system that gives users a clear reason to run more experiments. Each category has a discovery count that fills up, creating a sense of exploration and completeness.

Main Screen (playbook.tsx)

Layout:

  1. Header: "Your Body's Operating Manual"

  2. Category Progression Cards (the key engagement driver):

    ┌─────────────────────────────────┐
    │  Sleep Playbook    2 / 5 ████░  │
    │  Recovery Playbook 1 / 4 ██░░░  │
    │  Metabolic Playbook 0 / 3 ░░░░  │
    │  HRV Playbook      0 / 2 ░░░░  │
    │  RHR Playbook      1 / 3 ██░░░  │
    └─────────────────────────────────┘
    
    • Each category maps to experiment categories in the catalog
    • Denominator = number of experiments available in that category (from catalog)
    • Numerator = number of completed experiments with discoveries in that category
    • Tap a category → see discoveries for that category + available experiments to fill gaps
    • Categories with 0 discoveries show: "Run your first [category] experiment →"
  3. Top Health Levers (ranked by magnitude):

    • Ranked list of all discovered health levers across categories
    • Each entry: rank, habit name, magnitude badge, impact summary, category icon
    • Example: "#1 — Post-Dinner Walk | HIGH | RHR -6.5%, Deep Sleep +34.6%"
  4. Eliminated Variables section:

    • Experiments that produced Minimal/Inconclusive magnitude
    • Displayed with strikethrough style and "Eliminated" badge
    • Shows what was ruled out: "Magnesium — not a lever for your sleep"
    • Reinforces progress: "3 eliminated, 2 confirmed — your search is narrowing"
    • These count toward category progression (denominator explored, not just successes)
  5. Unconfirmed Patterns section:

    • Patterns spotted by the Unenrolled Discovery Engine but not yet confirmed via formal experiment
    • "Unconfirmed" badge + "Confirm with an experiment →" CTA
  6. Empty State: "Your body has stories to tell. Run your first experiment to start building your playbook."

Progression Logic

The category/denominator counts are derived from the experiment catalog:

  • v1 (8 experiments): Sleep: 5, RHR: 2, Sleep/RHR overlap: 1 → adjust to avoid double-counting
  • As catalog expands in v2, denominators grow — users always have more to explore
  • Both confirmed levers AND eliminated variables count toward progression — running an experiment always moves you forward
  • Numerator display: "3 explored (2 confirmed, 1 eliminated)" to show both types of progress
  • When a user completes all experiments in a category: "Category Complete! You've mapped your [category] levers."

Phase 3: Intelligence Layer (AI-Powered Features)

3.1 Unenrolled Discovery Engine (Pattern Spotting)

This is in v1 and is built in Sprint 2. The AI engine analyzes historical wearable data to find "accidental experiments" — patterns the user didn't intentionally create. This is our competitive advantage: users see a discovery before they even pick an experiment.

How It Works

  1. Trigger: Runs when:

    • User first connects a wearable with 14+ days of history (immediate value)
    • Weekly cron job for users with active data
    • On-demand when user visits Discover tab (if last scan >7 days ago)
  2. Data Collection: Edge Function ai-engine/spot-patterns gathers:

    • Last 30-90 days of daily_summary, sleep_sessions, glucose_data, activities
    • Looks for natural variation in behaviors (walking frequency, sleep timing, activity patterns)
  3. Statistical Pre-Filtering (BEFORE AI):

    The AI should only see patterns that meet strict statistical thresholds. This prevents hallucinated correlations.

    Minimum requirements to surface a pattern:

    • 20+ data points in each comparison group (e.g., 20 days with the behavior, 20 without)
    • Effect size >10% difference between groups
    • Consistency across weeks: The pattern must hold across at least 3 separate weeks (not a one-time cluster)
    • Statistical significance: p-value < 0.05 using Mann-Whitney U test (non-parametric, handles non-normal wearable data)
    • Not explainable by day-of-week effects: Control for weekend vs weekday patterns

    Pre-filter pipeline:

    Raw data → Behavioral segmentation → Statistical comparison → Filter by thresholds → AI narrative generation
    

    The statistical engine (not AI) identifies candidate patterns. The AI only generates the user-facing narrative for patterns that pass all filters.

  4. Pattern Detection Categories:

    • Activity → Recovery: "Days with 8,000+ steps correlate with 15% higher next-night HRV"
    • Sleep timing → Sleep quality: "Nights with bedtime before 10:30 PM show 22 min more deep sleep"
    • Exercise frequency → RHR: "Weeks with 3+ workouts show 5 bpm lower average RHR"
    • Temporal patterns: "Your HRV has been trending upward over the last 3 weeks"
  5. AI Narrative Generation (Gemini):

    • Only runs on statistically validated patterns
    • Generates user-friendly description using wellness-compliant language
    • Maps pattern to a catalog experiment that could confirm it
    • Must use correlational language only (FDA compliance)
  6. Output: Creates user_discoveries rows with:

    • discovery_type: 'unenrolled_pattern'
    • suggested_experiment_id: links to catalog experiment that could confirm the pattern
    • Title: "We noticed that on days you walk 8,000+ steps, your overnight HRV is 15% higher."
    • CTA: "Want to turn this into a formal 7-day experiment to confirm it?"
  7. Conversion Flow: User taps "Start Experiment" on an unenrolled discovery →

    • Pre-fills enrollment with the suggested catalog experiment
    • Notes the discovery that inspired it

Anti-Spam Rules

  • Max 3 unenrolled discoveries surfaced at a time
  • Don't resurface dismissed discoveries
  • Only patterns meeting ALL statistical thresholds (20+ points, >10%, multi-week consistency)
  • Don't surface patterns that contradict existing playbook entries
  • Rate limit: max 2 new discoveries per week per user

3.2 Experiment Recommender

Runs after each completed experiment and periodically.

Inputs:

  • User's completed experiments + results (from experiment_outcomes)
  • Current playbook entries
  • Available metrics (connected devices)
  • Current active experiments
  • Catalog of available experiments
  • User baseline profile (from most recent experiment_outcomes)

Logic:

  1. Complementary experiments: If earlier bedtime showed high impact on sleep, recommend Post-Dinner Walk or Caffeine Curfew
  2. Unexplored categories: If user has only done sleep experiments, suggest HRV or glucose experiments
  3. High-signal experiments: Prioritize experiments with high community impact rates
  4. Device-aware: Only recommend experiments the user can actually measure
  5. Personalized: "Users with a similar baseline RHR to yours typically see a 'High' magnitude of impact from this experiment."

3.3 Data Learning Pipeline

Every completed experiment feeds a pipeline that makes the system smarter over time.

Pipeline Steps

1. Experiment completes
   └─→ 2. Normalized outcome record created (experiment_outcomes table)
        └─→ 3. Community stats aggregation triggered
             └─→ 4. Cohort-level effect sizes recomputed
                  └─→ 5. Recommendation engine weights updated
                       └─→ 6. Starter pack priorities recalculated

Step Details

Step 1-2: Outcome Normalization When an experiment completes, the analysis engine creates an experiment_outcomes row:

  • User baseline profile is bucketed (age range, metric ranges) for anonymous aggregation
  • All metric changes are stored with effect sizes
  • Adherence, confounders, and concurrent experiments are captured
  • This is the atomic unit of the learning pipeline

Step 3: Community Stats Aggregation Runs as a batch job (daily cron or triggered on outcome creation):

-- Example aggregation query
SELECT
  catalog_experiment_id,
  COUNT(*) as total_completed,
  AVG((metric_changes->0->>'change_pct')::numeric) as avg_primary_metric_change,
  PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY (metric_changes->0->>'change_pct')::numeric) as median_change,
  COUNT(*) FILTER (WHERE overall_magnitude = 'high') * 100.0 / COUNT(*) as pct_high_impact
FROM experiment_outcomes
WHERE confidence IN ('strong', 'moderate')
GROUP BY catalog_experiment_id;

Step 4: Cohort Effect Estimation Group outcomes by user baseline profile buckets:

  • "Users with RHR 60-70 who ran Post-Meal Walk" → average effect
  • "Users with HRV 30-40 who ran Alcohol Elimination" → average effect
  • Stored in community_experiment_stats.baseline_segment_stats

Step 5: Recommendation Engine Update The recommender uses cohort-level data to personalize:

  • Match user's current baseline to closest cohort
  • Weight recommendations by that cohort's historical outcomes
  • v1: Simple heuristic matching; v2+: ML-based collaborative filtering

Step 6: Starter Pack Recalculation As community data grows, starter pack priorities may shift:

  • If Post-Meal Walk shows consistently higher impact than Alcohol Elimination for RHR-focused users, reorder
  • Initially manual review; later automated with guardrails

Bootstrap Strategy (Pre-Community Data)

Until we have sufficient real user data (target: 100+ completed outcomes per experiment):

  1. Seed community_experiment_stats with estimates from published research
  2. Mark seeded data with source: 'research_estimate' in the JSONB
  3. Blend: as real data accumulates, weight shifts from research estimates to actual outcomes
  4. Transition threshold: when 50+ real outcomes exist for an experiment, deprecate research estimate

3.4 Community Data Pipeline

Aggregation Job (runs daily via cron or on experiment completion):

  1. Query experiment_outcomes grouped by catalog_experiment_id
  2. For each catalog experiment:
    • Count total participants, total completed
    • Compute average change_pct per metric across all users
    • Compute magnitude distribution (% high, moderate, low, minimal)
    • Segment by baseline ranges (e.g., users with baseline HRV 30-40 vs 40-50 vs 50+)
  3. Update community_experiment_stats table
  4. All data is anonymized — no user IDs in the aggregated output

Display on Experiment Cards:

  • "84% of 1,200 participants saw a 5%+ increase in HRV using this protocol."
  • "Users with a similar baseline RHR to yours typically see a 'High' magnitude of impact."

3.5 Mid-Experiment Teasers

After day 5 of an active experiment:

  • Compare experiment-period-so-far metrics against baseline
  • If a primary metric is trending >5% different from baseline, surface a teaser
  • "Early signal detected: Your deep sleep is trending 15% higher than your baseline average"
  • Updates daily
  • Uses simple statistical comparison, not full AI analysis (save that for completion)

Phase 4: Migration & Polish

4.1 Navigation Migration

  1. Rename current (tabs)/experiments.tsx → incorporate into new (tabs)/lab.tsx
  2. Create new (tabs)/discover.tsx and (tabs)/playbook.tsx
  3. Modify (tabs)/_layout.tsx:
    • New tab order: Discover, My Lab, Playbook, Profile
    • Hide: Home, History, Insights (remove from tab bar but keep route files)
  4. Add "Event Logger" and "Glucose Insights" links in Profile for backward access

4.2 Existing Experiment Migration

Users with existing custom experiments (from old create-experiment flow):

  • Keep them in experiments table with is_custom = true, catalog_experiment_id = null
  • Display in My Lab under "Custom Experiments" section
  • Can still be completed and analyzed
  • Remove create-experiment screen from primary navigation

4.3 Update analyze-experiment Edge Function

Either extend existing or redirect to new ai-engine/analyze:

  • Add overall_magnitude computation
  • Switch AI provider to Gemini (with OpenAI fallback)
  • Add concurrent experiment awareness
  • Generate user_discoveries row on completion
  • Generate experiment_outcomes row (normalized for data learning pipeline)
  • Generate playbook suggestion
  • Use Magnitude of Impact framing in all prompts
  • Run compliance validation on all AI output

4.4 Apple Health + Google Fit — Deferred to v2

Apple Health and Google Fit integrations are moved to v2. For v1, users connect Whoop, Fitbit, Oura, Libre, or Dexcom (existing sync infrastructure). See v2 Roadmap section for details.


Phase 5: Privacy, Consent & Data Governance

5.1 Community Data Opt-Out

Users must be able to opt out of having their anonymized experiment outcomes included in community aggregations.

Implementation:

  • Add community_data_opt_in BOOLEAN DEFAULT true to user_profiles
  • During onboarding (or in Profile > Privacy settings), explain:

    "Your experiment results help the community by contributing to anonymous statistics like '78% of participants saw improvement.' No personal data is ever shared — only anonymized, aggregated numbers. You can opt out at any time."

  • If opted out:
    • Their experiment_outcomes rows are excluded from community aggregation queries
    • They can still see community stats (they just don't contribute)
    • Opt-out is retroactive: existing outcomes are excluded from next aggregation run

5.2 Research Usage Consent

If we ever plan to use the dataset for published research or share with partners:

Implementation:

  • Add research_consent BOOLEAN DEFAULT false to user_profiles
  • Separate, explicit consent screen (not bundled with community opt-in):

    "Would you like to contribute to health research? If you consent, your fully anonymized experiment data may be used in aggregate research studies. Your identity is never associated with research data. You can withdraw consent at any time."

  • Consent must be affirmative (opt-in, not opt-out)
  • Consent timestamp and version tracked: research_consent_at, research_consent_version

5.3 Data Retention Policy

Define and display clear data retention rules:

Data Type Retention Period Rationale
Raw wearable data (daily_summary, sleep_sessions, etc.) Indefinite (user-controlled) Users need historical data for baselines and pattern detection
Experiment records Indefinite (user-controlled) Users need their experiment history
Experiment outcomes (anonymized) Indefinite Core to community intelligence
Check-in data Indefinite (user-controlled) Part of experiment record
AI analysis outputs Indefinite (user-controlled) Part of discovery/playbook
Device tokens (OAuth) Until device disconnected or user deletes account Required for sync
Account deletion Full deletion within 30 days of request GDPR/CCPA compliance

Account deletion must:

  • Delete all PII (user_profiles, experiment records, discoveries, playbook)
  • Remove user from all community aggregations (re-aggregate without their data)
  • Revoke all OAuth tokens
  • Delete push tokens
  • Provide confirmation

5.4 Anonymization Policy

How experiment outcomes are anonymized for community use:

  1. No PII in aggregated data: community_experiment_stats contains only counts, averages, and percentiles — no user IDs, no individual records
  2. Baseline profiles are bucketed: Age ranges (20-29, 30-39, etc.), metric ranges (RHR 50-60, 60-70), never exact values
  3. Minimum aggregation threshold: Community stats only shown when 10+ completed outcomes exist for an experiment (prevents small-group identification)
  4. No temporal correlation: Aggregated stats are not timestamped to individual users
  5. Differential privacy (future): For very small cohorts, consider adding noise to aggregated values

5.5 User-Facing Privacy Documentation

Create an in-app "Data & Privacy" section (accessible from Profile):

  • How Your Data Is Used: Plain-language explanation of data flow
  • Community Data: Explanation of anonymization + opt-out toggle
  • Research Consent: Separate consent flow
  • Data Retention: What we keep and for how long
  • Delete My Data: Account deletion request flow
  • Export My Data: Download all personal data (GDPR right of portability)

Technical Architecture Decisions

AI Model Strategy

Primary: Google Gemini 2.0 Flash (fast, cost-effective for most analysis) Upgrade: Gemini 2.0 Pro (for complex pattern detection, recommendations) Fallback: OpenAI GPT-4o-mini (existing infrastructure, proven reliability)

Why Gemini first:

  • Competitive pricing for high-volume analysis
  • Strong structured output support
  • Good at pattern detection in numerical data
  • Swappable via provider abstraction if performance doesn't meet needs

Environment Variables:

AI_PROVIDER=gemini
GEMINI_API_KEY=<key>
GEMINI_FLASH_MODEL=gemini-2.0-flash
GEMINI_PRO_MODEL=gemini-2.0-pro
# Fallback
OPENAI_API_KEY=<existing>
OPENAI_CHAT_MODEL=gpt-4o-mini

Concurrent Experiment Handling & Attribution Confidence

Users can run multiple experiments simultaneously. Since this makes causality ambiguous, we use an Attribution Confidence Model that is honest about uncertainty and converts ambiguity into follow-up experiment opportunities.

Attribution Confidence Model

Instead of trying to determine causality, classify how confident the attribution is:

Situation Attribution Confidence Label
1 experiment active during period Strong "This experiment was the primary variable during this period."
2 experiments active Moderate "Multiple experiments were active. Improvements may be associated with more than one habit."
3+ experiments active Low "Several experiments were active simultaneously. Individual attribution is uncertain."

Attribution confidence is surfaced on every discovery:

Post-Meal Walk Experiment
Magnitude of Impact: High
Attribution Confidence: Moderate

Multiple experiments were active during this period.
Improvements may be associated with more than one habit.

Attribution Map

When attribution confidence is Moderate or Low, show an Attribution Map — all experiments that were active during the period, ranked by plausibility:

Your recovery improved during this experiment period.

Possible contributors:
├── Post-Dinner Walk       Confidence: Moderate
├── Magnesium Before Bed   Confidence: Moderate
└── Earlier Bedtime        Confidence: Low (started mid-period)

Plausibility ranking factors:

  • Temporal overlap: Experiments active for the full period rank higher than those that started mid-way
  • Protocol relevance: Experiments whose primary metrics match the improved metrics rank higher
  • Adherence: Higher adherence = higher attribution plausibility

"Confirm the Driver" Follow-Up Experiments

When attribution is ambiguous, the system converts uncertainty into the next experiment opportunity:

Your sleep improved during the last 14 days, but multiple habits changed.

Suggested next experiment:
🔬 Confirm the driver
Try pausing magnesium for 7 days while keeping everything else constant.
If your sleep stays improved, the Post-Meal Walk was likely the primary driver.

This creates a natural experiment chain:

  1. Run multiple experiments → see improvement → ambiguous attribution
  2. System suggests isolation experiment → user runs it
  3. Clear attribution → discovery confirmed with strong confidence

Implementation:

  • After analysis with Moderate/Low attribution, the AI generates a "confirm the driver" suggestion
  • Suggestion stored as a special recommendation type in user_discoveries
  • If user accepts, creates a new experiment that is a modified version (e.g., "Magnesium Pause" = keep everything else, remove one variable)
  • The follow-up experiment references the parent discovery for context

Technical Details

  1. Each experiment maintains its own baseline (computed at enrollment time)
  2. AI analysis prompt includes awareness of ALL concurrent experiments with their protocols
  3. experiment_outcomes records all concurrent experiment IDs for pipeline analysis
  4. Add to experiments table: concurrent_experiment_ids UUID[] — populated at completion time with IDs of all experiments that overlapped

Confounder Detection Strategy

At check-in (user-reported):

  • Alcohol, illness, travel, intense workout, poor sleep, significant stress

Automated (from wearable data):

  • Sleep duration outlier (< 4 hours)
  • Unusual activity level (>2 std dev from baseline)
  • New supplement/medication (from health events, if logged)

Excluded days: Days with reported confounders are flagged and optionally excluded from analysis. AI is told about excluded days and why.

Valid Day Computation

For experiment analysis, a "valid day" must:

  1. Have check-in data (adherence = yes or mostly, or auto-detected = true)
  2. Not be flagged with major confounders (illness, travel)
  3. Have metric data available from wearable
  4. Minimum 60% valid days required for analysis; otherwise confidence = 'suggestive'

FDA Compliance Architecture

Language Rules (Enforced in AI Prompts)

See Section 1.1 for the complete Banned Words / Required Phrases dictionary.

Enforced via:

  1. Wellness Terminology Audit during Sprint 1 (before any content is written)
  2. validateWellnessCompliance() function used in:
    • Catalog seed data CI validation
    • AI output post-processing (runtime)
    • All user-facing text review
  3. System prompt in all AI calls (shared-guidelines.ts)
  4. Output validation — scan AI responses for banned terms before displaying
  5. App-wide disclaimer: "For general wellness purposes only. Not intended to diagnose, treat, cure, or prevent any disease. Consult a healthcare professional before making health decisions."

Supplement/Medication Experiments — Extra Care

Experiments involving supplements (Magnesium, Creatine, ACV, Cinnamon):

  • Frame as "lifestyle experiments" not "therapeutic interventions"
  • Never claim dosing recommendations — use "amount" or "serving"
  • Include: "Consult your healthcare provider before starting any supplement"
  • Focus results on wearable metrics, not clinical outcomes

Implementation Order

Sprint 1: Foundation — Database + Compliance + Catalog (1-2 weeks) ✅ COMPLETE

  • Wellness Terminology Audit: Create banned words dictionary + validateWellnessCompliance() function
    • supabase/functions/_shared/compliance/banned-words.ts (51 banned terms, 5 required phrases)
    • supabase/functions/_shared/compliance/output-validator.ts (validate + auto-correct with case preservation)
    • 17/17 Deno tests pass including seed data compliance scan
  • Migration: experiment_catalog table (with adherence_detection, auto_detect_config fields)
  • Migration: experiment_outcomes table (normalized outcome records)
  • Migration: user_discoveries, user_playbook, community_experiment_stats tables
  • Migration: device_data_quality table
  • Migration: ALTER experiments (add catalog_experiment_id, baseline_metrics, baseline_quality, data_quality_at_enrollment, concurrent_experiment_ids, attribution_confidence, is_custom)
  • Migration: ALTER experiment_checkins (add confounders, auto_detected, auto_detect_data)
  • Migration: ALTER experiment_results (add overall_magnitude, ai_model, ai_prompt_version)
  • RLS policies for all new tables
  • Seed experiment catalog (8 high-impact experiments) — all descriptions pass compliance validation
    • supabase/migrations/20260312100000_seed_experiment_catalog.sql (8 experiments applied to remote Supabase)
  • Update TypeScript types (database.types.ts, experiment types)
    • mobile/src/types/database.types.ts — 6 new table types + updated existing tables
    • mobile/src/utils/experiments/types.ts — 20+ new interfaces
  • CI check: automated compliance scan on catalog seed data

Sprint 2: AI Engine + Pattern Spotting (1-2 weeks) ✅ COMPLETE

  • Create ai-engine Edge Function with provider abstraction
    • supabase/functions/ai-engine/index.ts — Router with CORS, JWT auth, 4 routes
    • supabase/functions/ai-engine/providers/types.ts — AIProvider interface
    • supabase/functions/ai-engine/providers/factory.ts — Provider factory (env-var selection)
  • Implement Gemini provider
    • supabase/functions/ai-engine/providers/gemini.ts
  • Implement OpenAI fallback provider
    • supabase/functions/ai-engine/providers/openai.ts
  • Compliance module: banned-words.ts + output-validator.ts (completed in Sprint 1)
  • Implement pattern-spotter engine (unenrolled discoveries)
    • supabase/functions/ai-engine/engines/pattern-spotter.ts
    • Statistical pre-filtering (20+ data points, >10% effect, multi-week consistency, p<0.05)
      • mobile/src/utils/experiments/mannWhitneyU.ts — Mann-Whitney U test (7 tests)
      • mobile/src/utils/experiments/patternFilters.ts — 5 filter functions (11 tests)
    • Behavioral segmentation (steps, sleep timing, activity frequency)
    • AI narrative generation for validated patterns
      • supabase/functions/ai-engine/prompts/pattern-detection.ts
    • Anti-spam rules (max 3, dismissed tracking, rate limiting)
      • mobile/src/utils/experiments/antiSpam.ts (12 tests)
  • Implement experiment-analyst engine (extends analyze-experiment logic)
    • supabase/functions/ai-engine/engines/experiment-analyst.ts
    • Magnitude of Impact scoring
      • mobile/src/utils/experiments/magnitudeScoring.ts (32 tests)
    • experiment_outcomes record creation
    • Attribution Confidence computation (strong/moderate/low based on concurrent experiment count)
    • Attribution Map generation for moderate/low confidence
    • "Confirm the Driver" follow-up experiment suggestions
  • Implement recommender engine
    • supabase/functions/ai-engine/engines/recommender.ts
    • supabase/functions/ai-engine/prompts/recommendation.ts
  • Implement starter-pack engine (metric gap analysis + personalized recommendations)
    • supabase/functions/ai-engine/engines/starter-pack.ts
    • mobile/src/utils/experiments/metricGapAnalysis.ts (41 tests)
    • mobile/src/utils/experiments/starterPackScoring.ts
    • mobile/src/utils/experiments/baselineComputation.ts (13 tests)
  • Mobile client: experimentAIClient.ts
    • mobile/src/utils/experiments/experimentAIClient.ts (13 tests)
  • Total: 132 new tests, all passing. Full suite: 2075 tests, 0 regressions.

Sprint 3: Discover Tab (Insight-First) + Catalog UI (1-2 weeks) ✅ COMPLETE

  • New (tabs)/discover.tsxinsight-first layout (discoveries before catalog)
    • mobile/src/app/(tabs)/discover.tsx — Insights Hero + Personalized/Static Starter Pack + Full Catalog sections
  • Insights Hero section (unenrolled discoveries, loading states, empty states)
    • Placeholder states: "Connect a Wearable" / "Analyzing Your Data"
  • Experiment card component with community data + adherence type indicator
    • mobile/src/components/Experiments/CatalogExperimentCard.tsx
    • Shows: name, category, difficulty, duration, adherence type, primary metrics, data availability
    • Community data display pending Sprint 2 aggregation pipeline
  • Experiment detail screen (catalog-experiment/[slug].tsx)
    • mobile/src/app/catalog-experiment/[slug].tsx — protocol, goal, why it works, metrics, evidence, confounders
  • Metric availability detection per experiment
    • Checks user's connected devices against required_data_sources
  • Data availability warnings + data quality indicators
    • Warning card with missing source count + "Connect Device" CTA
  • Personalized Starter Pack for new users (powered by Sprint 2 starter-pack engine)
    • mobile/src/hooks/usePersonalizedStarterPack.ts — React Query hook calling AI engine /starter-pack
    • Discover tab shows "Recommended For You" with hero experiment, personalized reasons, metric gap summary
    • Falls back to static "Start Here" section when AI is unavailable or no wearable connected
    • mobile/__tests__/hooks/usePersonalizedStarterPack.test.ts (10 tests)
    • mobile/__tests__/components/DiscoverPersonalized.test.tsx (10 tests)
  • Total: 20 new tests for Sprint 3 personalization. Full suite: 2095 tests, 0 regressions.

Sprint 4: Enrollment + My Lab (1-2 weeks) ✅ COMPLETE

  • Enrollment flow with auto-baseline computation
    • mobile/src/app/enroll-experiment/[slug].tsx — 3-step wizard (Protocol Review → Baseline Preview → Confirm & Start)
    • mobile/src/utils/experiments/enrollment.tsvalidateEnrollment(), buildEnrollmentPayload(), computeBaselinePeriodDates(), checkConcurrentConflicts()
    • mobile/src/hooks/useEnrollExperiment.ts — orchestrates baseline fetch, validation, and enrollment mutation
    • mobile/src/utils/experiments/__tests__/enrollment.test.ts (24 tests)
  • Adherence method explanation in enrollment
    • mobile/src/components/Experiments/AdherenceMethodExplainer.tsx — explains auto/semi_auto/manual detection methods
  • Concurrent experiment handling + attribution warnings
    • mobile/src/components/Experiments/ConcurrentWarning.tsx — warning card with attribution confidence badge
    • Reuses computeAttributionConfidence() from magnitudeScoring.ts
  • New (tabs)/lab.tsx replacing old experiments tab
    • mobile/src/app/(tabs)/lab.tsx — active experiments, completed section, adopted habits, empty state
    • mobile/src/app/(tabs)/_layout.tsx — lab tab with FlaskConical icon, old experiments tab hidden via href: null
    • mobile/__tests__/components/TabBar.test.tsx updated for 7 tabs (6 visible + 1 hidden)
  • Active experiment cards with progress
    • mobile/src/components/Experiments/ActiveExperimentLabCard.tsx — progress bar, adherence badge, check-in CTA, teaser snippet
  • Adaptive check-in flow (auto / semi_auto / manual based on adherence_detection)
    • mobile/src/components/Experiments/AdaptiveCheckin.tsx — switches display based on adherence mode
    • Integrated into mobile/src/app/experiment/[id].tsx
  • Auto-adherence detection logic (check auto_detect_config against daily wearable data)
    • mobile/src/utils/experiments/autoAdherence.tsevaluateAdherence() dispatcher + 4 evaluators (sleep start, activity after time, wake variance, morning activity)
    • mobile/src/hooks/useAutoAdherence.ts — fetches wearable data, auto-creates checkins for auto mode
    • mobile/src/utils/experiments/__tests__/autoAdherence.test.ts (25 tests)
  • Semi-auto one-tap confirmation flow
    • useAutoAdherence hook returns evaluation + confirmCheckin mutation for semi_auto mode
    • AdaptiveCheckin component renders one-tap confirm/override UI
  • Confounder tracking in check-ins
    • mobile/src/utils/experiments/confounderCheckin.tsCONFOUNDER_LABELS, getConfounderOptions(), formatConfounderRecord(), parseConfounderRecord(), countActiveConfounders()
    • mobile/src/components/Experiments/ConfounderCheckboxes.tsx — horizontal-wrap toggle chips with icons
    • mobile/src/utils/experiments/__tests__/confounderCheckin.test.ts (16 tests)
  • Mid-experiment teaser insights
    • mobile/src/utils/experiments/teaserInsights.tscomputeTeaserInsights(), computeSingleTeaser(), classifyTeaserDirection()
    • mobile/src/hooks/useTeaserInsights.ts — fetches metric data with 6-hour stale time
    • mobile/src/components/Experiments/TeaserInsightsCard.tsx — direction indicators (↑/↓/→) with change percentages
    • mobile/src/utils/experiments/__tests__/teaserInsights.test.ts (18 tests)
  • Experiment completion trigger
    • mobile/src/utils/experiments/completionDetection.tsshouldCompleteExperiment(), assessCompletionQuality(), getCompletionAction()
    • mobile/src/hooks/useCompletionCheck.ts — completion readiness + triggerCompletion mutation
    • mobile/src/components/Experiments/CompletionModal.tsx — bottom sheet adapting to action type (complete/extend/low quality)
    • mobile/src/utils/experiments/__tests__/completionDetection.test.ts (19 tests)
  • Data quality monitoring integration (warn on degraded sync during experiment)
    • mobile/src/hooks/useDataQualityMonitor.ts — fetches device_data_quality, surfaces warnings for failing/degraded sync or quality_score < 60
    • Integrated as banner in mobile/src/app/experiment/[id].tsx
  • Phase 1 (Pure Functions TDD): 5 modules, 102 new tests — all passing
  • Phase 2 (Hooks): 5 React Query hooks orchestrating Phase 1 functions
  • Phase 3 (UI): 3-step enrollment wizard, My Lab tab, 7 new components, experiment detail integration
  • Modified existing files: _layout.tsx (tab rename), catalog-experiment/[slug].tsx (CTA → enrollment), experiment/[id].tsx (Sprint 4 integrations), TabBar.test.tsx (updated assertions)
  • Total: 102 new tests for Sprint 4. Full suite: 2198 tests, 0 regressions.

Sprint 5: Discovery + Playbook (1 week) ✅ COMPLETE

  • Discovery presentation screen (Magnitude of Impact + Attribution Confidence)
    • mobile/src/app/discovery/[id].tsx — full discovery screen with magnitude badge, metric cards, AI summary
    • mobile/src/components/Experiments/MagnitudeBadge.tsx — colored badge per magnitude level
    • mobile/src/components/Experiments/DiscoveryMetricCard.tsx — metric label + absolute change + baseline→observed
  • Attribution Map display for moderate/low confidence discoveries
    • mobile/src/components/Experiments/AttributionMapCard.tsx — tree-style concurrent experiments + plausibility
    • mobile/src/utils/experiments/discoveryPresentation.tsshouldShowAttributionMap(), formatAttributionConfidence()
  • "Confirm the Driver" follow-up suggestion on discovery screen
    • mobile/src/components/Experiments/ConfirmDriverCard.tsx — suggestion + isolation experiment CTA
  • "Add to Playbook" flow
    • mobile/src/utils/experiments/addToPlaybook.tsbuildPlaybookInsert(), determinePlaybookMagnitude(), computeNextRank()
    • mobile/src/hooks/useAddToPlaybook.ts — mutation hook with cache invalidation
  • (tabs)/playbook.tsxprogression system (category progress bars + ranked health levers)
    • mobile/src/utils/experiments/playbookProgression.tscomputeCategoryProgression(), rankHealthLevers(), classifyPlaybookEntries()
    • mobile/src/hooks/usePlaybook.ts — React Query hook computing progression, ranking, classification
    • mobile/src/components/Experiments/PlaybookCategoryCard.tsx — category progress bar + summary
    • mobile/src/components/Experiments/PlaybookEntryRow.tsx — ranked lever with magnitude badge
    • mobile/src/components/Experiments/EliminatedVariableRow.tsx — strikethrough + "Eliminated" badge
  • "What's Next?" recommendations on discovery screen
    • mobile/src/utils/experiments/whatsNextRecommendation.tsselectWhatsNextExperiments() scoring engine
    • mobile/src/hooks/useWhatsNext.ts — React Query hook fetching catalog + community stats
    • mobile/src/components/Experiments/WhatsNextCard.tsx — recommendation cards with community impact %
  • Playbook empty state
  • Discovery + Playbook hooks: useDiscovery, usePlaybook, useAddToPlaybook, useWhatsNext
  • Null result framing: isNullResult(), getNullResultFraming() for minimal/inconclusive outcomes

Total: 73 new pure function tests for Sprint 5. Full suite: 2302 tests, 0 regressions.

Sprint 6: Navigation Migration + Data Pipeline (1 week) ✅ COMPLETE

  • Restructure tab bar: Discover, My Lab, Playbook, Profile
    • mobile/src/app/(tabs)/_layout.tsx — 4 visible + 4 hidden tabs
    • mobile/__tests__/components/TabBar.test.tsx — 20 tests (updated for new nav)
  • Hide event logging tabs (keep routes accessible via Profile)
    • Home, History, Insights hidden with href: null
  • Update _layout.tsx with new tab order
    • Order: Discover → My Lab → Playbook → Profile
  • Playbook tab placeholder (Sprint 5 will flesh out)
    • mobile/src/app/(tabs)/playbook.tsx — empty state with BookOpen icon
  • Community data bootstrap (seed from research estimates for 8 v1 experiments)
    • supabase/migrations/20260312200000_seed_community_stats_bootstrap.sql
    • Research-sourced impact stats for all 8 v1 catalog experiments
  • Data learning pipeline: aggregation utility (experiment_outcomes → community_experiment_stats)
    • mobile/src/utils/experiments/communityStatsAggregation.tsaggregateOutcomesToCommunityStats()
    • mobile/__tests__/utils/communityStatsAggregation.test.ts — 11 tests
    • Computes: distinct participants, metric percentiles (p25/median/p75), impact distribution, baseline segment stratification
  • Community stats hook: useCommunityStats for catalog cards
    • mobile/src/hooks/useCommunityStats.ts — React Query hook with 15-min staleTime
    • mobile/__tests__/hooks/useCommunityStats.test.ts — 8 tests
  • Data quality assessment hook: useDataQuality for sync health monitoring
    • mobile/src/hooks/useDataQuality.ts — React Query hook with overallSyncHealth helper
    • mobile/__tests__/hooks/useDataQuality.test.ts — 9 tests
  • Update onboarding hints
    • mobile/src/hooks/useOnboardingHints.ts — AsyncStorage-backed first-run hint system
    • mobile/__tests__/hooks/useOnboardingHints.test.ts — 12 tests
    • Sequential progression: Discover → My Lab → Playbook → First Experiment
    • Dismiss all, reset, corrupted data recovery
  • Data learning pipeline cron: aggregation edge function + pg_cron
    • supabase/functions/aggregate-community-stats/index.ts — daily cron (3 AM UTC)
    • Fetches outcomes, groups by catalog ID, upserts aggregated stats
  • Data quality assessment cron: quality scoring edge function + pg_cron
    • supabase/functions/assess-data-quality/index.ts — hourly cron (:15 past, after sync)
    • Scores devices 0-100, classifies sync health, tracks metric availability
    • supabase/migrations/20260313000000_add_data_pipeline_cron_jobs.sql — pg_cron setup for both
  • Migrate existing custom experiments to new schema (deferred — no custom experiments in production yet)

Total: 60 new tests for Sprint 6. Full suite: 2396 tests, 0 regressions.

Sprint 7: Privacy + Testing + Polish (1-2 weeks) ✅ COMPLETE

Phase 1 — Pure Functions (TDD):

  • Privacy types + validation (mobile/src/utils/privacy/types.ts, privacyValidation.ts) — deletion request validation, retention days validation, consent change detection, account deletion summary, community exclusion logic (21 tests)
  • Compliance text constants (complianceText.ts) — FDA disclaimers, medical disclaimers, experiment disclaimers, AI disclaimers, community data disclaimers, data deletion warnings; context-based disclaimer selector (15 tests)
  • Community opt-out filtering (communityOptOut.ts) — filters outcomes by opted-out user IDs, computes opt-out impact on data sufficiency (9 tests)
  • Attribution model validation — 18-test validation suite for magnitudeScoring.ts with simulated concurrent experiment scenarios (0-3+ concurrent, overlap/adherence/metric relevance scoring, deterministic ordering, magnitude independence)
  • Performance benchmarks — 10 benchmark tests ensuring core functions scale linearly (computeOverallMagnitude, computeAttributionMap, computeTeaserInsights, validateEnrollment, filterOutcomes at 10k scale, selectStarterPack at 50 entries, computeBaselineFromValues at 1k points)
  • AI response schema validation — 9 contract tests validating ExperimentAnalysisResult, StarterPackResult, RecommendedExperiment shapes match mobile client expectations

Phase 2 — Database Migration:

  • supabase/migrations/20260313000000_privacy_consent_and_account_deletion.sql
    • user_privacy_settings table (community_data_opt_in, research_consent, data_retention_days with CHECK >= 30)
    • consent_audit_log table (immutable audit trail with consent_version, consent_type)
    • account_deletion_requests table (pending → processing → completed lifecycle)
    • community_data_opt_in column on experiment_outcomes
    • delete_account(p_user_id) RPC — SECURITY DEFINER, cascades through all user tables, deletes auth.users row
    • RLS policies and indexes for all new tables

Phase 3 — Hooks:

  • usePrivacySettings hook — React Query fetch + upsert + consent audit logging
  • useAccountDeletion hook — validates deletion request, calls delete_account RPC, clears SecureStore + Zustand auth state

Phase 4 — UI Screens + Components:

  • MedicalDisclaimer component — context-aware disclaimer text (experiment_result, teaser, discovery, ai_recommendation, community_stats), compact mode
  • Privacy & Data screen (mobile/src/app/privacy.tsx) — community data toggle, research consent toggle, data retention picker (30/60/90/180/365/Indefinite), delete account button, footer links
  • Account Deletion screen (mobile/src/app/delete-account.tsx) — multi-step flow: Summary → Reason (optional) → Type "DELETE" confirmation → Processing → Done
  • Profile screen updates — wired "Privacy & Security" and "Account Settings" to /privacy, added MedicalDisclaimer footer
  • Disclaimer additions — MedicalDisclaimer added to Discovery detail, Playbook tab, Discover tab

Phase 5 — Integration Tests:

  • Experiment lifecycle E2E (mobile/__tests__/integration/experiment/lifecycle.test.ts) — 8 tests: catalog creation, enrollment, check-ins, completion with outcome + discovery, add to playbook, cancellation, concurrent attribution, low adherence (requires local Supabase)
  • Privacy integration tests (mobile/__tests__/integration/privacy/account-deletion.test.ts) — 5 tests: privacy settings CRUD, retention constraint enforcement, consent audit logging, account deletion cascade, community opt-out flag on outcomes (requires local Supabase)

Total: 94 new tests for Sprint 7 (82 unit + 12 integration). Full suite: 2396 tests passing, 0 regressions.


v2 Roadmap

v2.1: Expanded Catalog + Wearable Integrations

  • Expand experiment catalog from 8 to ~50 experiments (full CSV)
  • Apple Health integration (read-only: HRV, RHR, sleep, steps, workouts, SpO2)
  • Google Fit integration (REST API, OAuth flow, sync function)
  • New experiment categories: Glucose, Metabolic, Body Composition, VO2 Max, Functional, Exercise
  • Category-specific wellness ranges for Metric Gap Analysis

v2.2: Shareable Discovery Cards (Virality Engine)

Generate beautiful, shareable images from experiment results and playbook entries.

Experiment Discovery Card:

┌─────────────────────────────────────┐
│         MY BODY EXPERIMENT          │
│                                     │
│      Alcohol Elimination            │
│           10 days                   │
│                                     │
│  Deep Sleep:      +31%             │
│  HRV:             +24%             │
│  Resting HR:      -5 bpm           │
│                                     │
│  Magnitude of Impact: HIGH          │
│                                     │
│  Your body is a lab.                │
│  Start the discovery.              │
│             Health Decoder          │
└─────────────────────────────────────┘

Top Health Levers Card:

┌─────────────────────────────────────┐
│   MY BODY'S TOP HEALTH LEVERS      │
│                                     │
│  1️⃣  Earlier bedtime               │
│     HRV +22%                        │
│                                     │
│  2️⃣  No alcohol                    │
│     Deep sleep +31%                 │
│                                     │
│  3️⃣  Evening walk                  │
│     Resting HR −4 bpm               │
│                                     │
│  Your body is a lab.                │
│  Start the discovery.              │
│             Health Decoder          │
└─────────────────────────────────────┘

Implementation:

  • Generate card as a rendered React Native view → export to image via react-native-view-shot
  • Share via native share sheet (iOS/Android)
  • Include app download link / deep link
  • Available from: Discovery result screen, Playbook screen
  • Watermarked with "Health Decoder" branding + tagline

v2.3: Smart Notifications (Experiment-Aware Push)

Design principle: every notification must feel like help, not spam. No generic reminders.

Notification Types:

Trigger Notification Value
Daily check-in due (manual experiments only) "Quick check-in: Did you follow the Caffeine Curfew protocol today? [Yes] [Mostly] [No]" Actionable one-tap
Semi-auto activity detected "We detected a 25-min walk at 7:15 PM. Did you wear the weighted vest? [Yes] [No]" One-tap confirmation
Mid-experiment teaser (day 5+) "Early signal: Your deep sleep is trending 18% higher than baseline." Motivation
Experiment completion "Your Alcohol Elimination experiment is complete! Tap to see your discovery." Excitement
Unenrolled discovery found "We noticed a pattern in your data. Tap to see what we found." Magic moment
Data quality degraded "Your Oura hasn't synced in 48 hours. This may affect your active experiment." Helpful warning
Playbook milestone "You've completed 3 Sleep experiments! Your Sleep Playbook is 60% complete." Progression

Anti-Annoyance Rules:

  • Max 2 notifications per day
  • Never send between 10 PM and 7 AM (respect sleep experiments!)
  • Group related notifications
  • User can mute per-experiment or globally
  • If user ignores 3 consecutive notifications, reduce frequency

v2.4: Automated Verification (Wearable Activity Detection)

Extend auto-adherence beyond simple threshold checks:

  • Detect specific activity types from wearable data (walk, run, strength training)
  • Cross-reference with experiment protocols
  • For semi_auto experiments, detect the activity and prompt one-tap confirmation
  • For auto experiments, silently verify and mark adherence

v2.5: Subjective Wellness Check-ins

Allow users to report how they're subjectively feeling throughout the day. This data becomes a first-class metric in experiment analysis, complementing objective wearable data.

UX Design

Prompt: A small, non-intrusive floating card that appears at configurable times:

┌─────────────────────────────┐
│  How are you feeling?       │
│                             │
│  Energy                     │
│  😴  😐  🙂  😊  🔥       │
│                             │
│  Mood                       │
│  😞  😐  🙂  😊  😄       │
│                             │
│  Focus                      │
│  🌫️  😐  🙂  😊  🎯       │
│                             │
│  Physical Comfort           │
│  😣  😐  🙂  😊  💪       │
│                             │
│  [Skip]          [Save]     │
└─────────────────────────────┘

Key Design Decisions:

  • 4 dimensions: Energy, Mood, Focus, Physical Comfort
  • 5-point scale per dimension (1-5, displayed as emoji faces for instant comprehension)
  • One-tap per dimension: Tap the emoji, done. Entire check-in <5 seconds.
  • 3 prompts per day at configurable times:
    • Morning (default: 30 min after wake time detected from wearable)
    • Afternoon (default: 2 PM)
    • Evening (default: 8 PM)
  • Not mandatory: Users can skip or dismiss. No guilt mechanics.
  • Adaptive timing: If wearable detects wake time, adjust morning prompt accordingly

Data Model

CREATE TABLE subjective_checkins (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  user_id UUID NOT NULL REFERENCES auth.users(id) ON DELETE CASCADE,
  checkin_time TIMESTAMPTZ NOT NULL,
  time_of_day TEXT NOT NULL CHECK (time_of_day IN ('morning', 'afternoon', 'evening')),
  energy INTEGER NOT NULL CHECK (energy BETWEEN 1 AND 5),
  mood INTEGER NOT NULL CHECK (mood BETWEEN 1 AND 5),
  focus INTEGER NOT NULL CHECK (focus BETWEEN 1 AND 5),
  physical_comfort INTEGER NOT NULL CHECK (physical_comfort BETWEEN 1 AND 5),
  created_at TIMESTAMPTZ DEFAULT now()
);

CREATE INDEX idx_subjective_user_time ON subjective_checkins(user_id, checkin_time DESC);

Integration with Experiments

Subjective data becomes a metric in experiment analysis:

  1. New metric registry entries:

    • avg_energy — Average daily energy score (source: subjective_checkins)
    • avg_mood — Average daily mood score
    • avg_focus — Average daily focus score
    • avg_comfort — Average daily physical comfort score
  2. Experiment results include subjective data:

    Post-Meal Walk Experiment — 14 days
    
    Objective Metrics:
    RHR: -4 bpm (62→58)    -6.5%
    Deep Sleep: +18 min      +34.6%
    
    Subjective Metrics:
    Afternoon Energy: +0.8   (3.2→4.0)
    Evening Mood: +0.5       (3.5→4.0)
    
  3. Pattern detection uses subjective data:

    • "Your energy is 35% higher on days following 7+ hours of sleep"
    • "Your focus score drops 0.8 points on days after alcohol consumption"
    • These become unenrolled discoveries
  4. Subjective data resolves "objective ambiguity": Sometimes wearable metrics show modest change but subjective improvement is dramatic. The discovery can note: "While your HRV showed a modest 5% improvement, your self-reported energy increased 40% during this experiment."

Privacy Consideration

  • Subjective data is deeply personal
  • Included in data export (GDPR)
  • Excluded from community aggregation by default (user must explicitly opt in)
  • Never shared in share cards

v2.6: Additional Features

  • ABAB Experiment Design — Advanced mode for power users to run alternating phases (A=normal, B=intervention, A=normal, B=intervention) for stronger evidence
  • Community Experiment Cohorts — Users running the same experiment see anonymized group progress
  • Counterfactual Estimation — "What would have happened without this experiment?" using baseline trend projection
  • Custom Experiment Creation — Allow users to design their own experiments
  • ML-Based Recommendation — Replace heuristic recommender with collaborative filtering as dataset grows
  • Data Export (GDPR) — Download all personal data in machine-readable format
  • Differential Privacy — Add noise to small-cohort aggregations to prevent re-identification