Experiment-Centric App Pivot — Implementation Plan

Created: 2026-03-12 Updated: 2026-03-13 (v8 — Sprints 1-7 COMPLETE) Status: In Progress Tagline: "Your body is a lab. Start the discovery."

Executive Summary

HealthDecoder pivots from a health event logging app to an experiment-centric discovery platform. Users browse a curated experiment catalog, enroll in experiments measured by their wearable data, and receive AI-powered "Magnitude of Impact" analysis. The app also proactively spots patterns in historical data ("Unenrolled Discoveries") and builds a personal "Playbook" of health levers ranked by impact.

What We Keep

Wearable sync infrastructure (Whoop, Fitbit, Oura, Libre, Dexcom)
Database tables: experiments, experiment_metrics, experiment_checkins, experiment_results
Metric registry with vendor-specific extraction (mobile/src/utils/experiments/metrics.ts)
analyze-experiment Edge Function (statistical analysis + AI interpretation)
Auth flow, Supabase backend, push token infrastructure
All data models, API routes, and sync logic for event logging (hidden, not removed)

What We Add

Experiment Catalog — curated library of 4-8 high-impact experiments for v1 (expanded in v2)
Unenrolled Discovery Engine — AI pattern spotting on historical data ("accidental experiments")
Magnitude of Impact scoring — replaces success/failure framing
User Discoveries — formatted insights from completed experiments and pattern detection
User Playbook — "Your Body's Operating Manual" ranked by impact magnitude
Community Data — anonymous aggregated stats on experiment cards ("Wisdom of the Lab")
AI Model Abstraction — Gemini-first with provider-swappable architecture
Data Learning Pipeline — normalized experiment outcomes feeding recommendation intelligence
Data Quality Monitoring — wearable sync health scoring and gap detection
New Navigation — Discover / My Lab / Playbook / Profile (retire Home, History, Insights tabs)

What We Retire (Hide, Not Remove)

Home tab (event logging via voice/text/camera)
History tab (event timeline)
Insights tab (glucose charts, analytics)
Create-experiment screen (replaced by catalog enrollment)

Phase 1: Foundation (Database + AI + Metrics)

1.1 Wellness Terminology Audit (Sprint 1 — Do First)

Before building the catalog or any AI prompts, establish the Compliance Dictionary. Every engineer, content writer, and AI prompt must use this reference. If clinical terms leak into Sprint 1 seed data, fixing them later is a costly refactor.

Banned Words Dictionary

Banned Term	Approved Replacement	Context
diagnose / diagnosis	identify / observe	Never imply clinical diagnosis
treat / treatment	experiment / protocol	We run experiments, not treatments
cure	improve / support	No curative claims
prevent / prevention	associated with lower / support	No prevention claims
disease	— (omit entirely)	Never reference diseases
diabetes	blood sugar wellness	If glucose context needed
hypertension	heart rate patterns	If BP context needed
cardiovascular disease	heart wellness	Never name diseases
insulin resistance	glucose response	Correlational framing
insulin sensitivity	glucose response efficiency	Correlational framing
A1C / HbA1c	long-term glucose patterns	Do not reference clinical biomarkers
blood pressure	— (omit unless from device)	Not a wearable metric we track
prescribe / prescription	suggest / recommend trying	We are not prescribers
dose / dosage	amount / serving	For supplement experiments
therapeutic	wellness-focused	No therapeutic claims
clinical	— (omit)	We are not clinical
patient	user / participant	Users, not patients
symptom	experience / observation	Observational framing
risk factor	pattern associated with	Correlational only
mortality	longevity / lifespan	Only in evidence citations
success / failure	magnitude of impact	Core framing rule

Required Phrases

Every experiment card, AI output, and discovery must include appropriate framing:

Context	Required Phrase
All AI outputs	"For informational purposes only. Not medical advice."
Supplement experiments	"Consult your healthcare provider before starting any supplement."
All discovery results	"associated with" or "correlated with" (never "caused by")
Experiment framing	"lifestyle experiment" (never "intervention" or "treatment")
App-wide footer	"For general wellness purposes only. Not intended to diagnose, treat, cure, or prevent any disease."

Audit Process

Draft all 50 catalog entries using approved terminology
Run automated scan for banned terms before seed data is committed
Build a lint/validation function: validateWellnessCompliance(text: string): { pass: boolean, violations: string[] }
This function is used in:
- Catalog seed data validation (CI check)
- AI output post-processing (runtime scan before display)
- Experiment description editing (admin tool, future)

1.2 New Database Tables

`experiment_catalog` — The Library

CREATE TABLE experiment_catalog (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  slug TEXT UNIQUE NOT NULL,
  name TEXT NOT NULL,
  category TEXT NOT NULL,
  subcategory TEXT,
  protocol_summary TEXT NOT NULL,       -- 1-2 sentence card description
  protocol_detail TEXT NOT NULL,        -- Full protocol with instructions
  goal TEXT NOT NULL,
  why_it_works TEXT NOT NULL,
  difficulty TEXT NOT NULL CHECK (difficulty IN ('easy', 'moderate', 'hard')),
  default_duration_days INTEGER NOT NULL,
  min_duration_days INTEGER NOT NULL DEFAULT 7,
  primary_metrics JSONB NOT NULL,       -- [{metric_key, metric_label, unit, data_source, higherIsBetter}]
  secondary_metrics JSONB NOT NULL DEFAULT '[]',
  required_data_sources TEXT[] NOT NULL, -- which provider types needed
  confounders TEXT[] NOT NULL DEFAULT '{}',
  adherence_detection TEXT NOT NULL DEFAULT 'manual'
    CHECK (adherence_detection IN ('auto', 'semi_auto', 'manual')),
  -- auto: fully detectable from wearable data (bedtime, steps, activity frequency)
  -- semi_auto: partially detectable, confirm with one-tap (walking + vest, workout type)
  -- manual: requires user check-in (supplements, food habits, breathing exercises)
  auto_detect_config JSONB,            -- rules for auto/semi_auto detection (see section 2.4)
  evidence_summary TEXT,
  evidence_url TEXT,
  starter_pack BOOLEAN DEFAULT false,  -- true = included in new-user starter pack candidates
  starter_pack_priority INTEGER,       -- lower = higher priority within starter pack
  tags TEXT[] DEFAULT '{}',
  is_active BOOLEAN DEFAULT true,
  sort_order INTEGER DEFAULT 0,
  created_at TIMESTAMPTZ DEFAULT now(),
  updated_at TIMESTAMPTZ DEFAULT now()
);

CREATE INDEX idx_experiment_catalog_category ON experiment_catalog(category);
CREATE INDEX idx_experiment_catalog_active ON experiment_catalog(is_active) WHERE is_active = true;
CREATE INDEX idx_experiment_catalog_starter ON experiment_catalog(starter_pack) WHERE starter_pack = true;

`experiment_outcomes` — Normalized Outcome Records (Strategic Asset)

-- Each completed experiment produces exactly one normalized outcome record.
-- This table is the foundation of the data learning pipeline and community intelligence.
-- It is intentionally denormalized for fast aggregation queries.
CREATE TABLE experiment_outcomes (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  experiment_id UUID NOT NULL REFERENCES experiments(id) ON DELETE CASCADE,
  catalog_experiment_id UUID REFERENCES experiment_catalog(id),
  user_id UUID NOT NULL REFERENCES auth.users(id) ON DELETE CASCADE,

  -- User baseline profile (anonymized snapshot at enrollment time)
  user_baseline_profile JSONB NOT NULL,
  -- {age_bucket: "30-39", rhr_bucket: "60-70", hrv_bucket: "40-50", sleep_bucket: "6-7h",
  --  connected_providers: ["oura", "fitbit"], baseline_quality: "good"}

  -- Experiment metadata
  experiment_category TEXT NOT NULL,
  experiment_duration_days INTEGER NOT NULL,
  actual_duration_days INTEGER NOT NULL,

  -- Adherence
  protocol_adherence_pct NUMERIC(5,2) NOT NULL,
  valid_days INTEGER NOT NULL,
  excluded_days INTEGER NOT NULL DEFAULT 0,

  -- Confounders
  confounders_present TEXT[] DEFAULT '{}',
  concurrent_experiments INTEGER DEFAULT 0,

  -- Metric changes (the core data)
  metric_changes JSONB NOT NULL,
  -- [{metric_key, baseline_mean, baseline_stddev, experiment_mean, change_pct,
  --   effect_size_cohens_d, direction, data_points_baseline, data_points_experiment}]

  -- Scoring
  overall_magnitude TEXT NOT NULL
    CHECK (overall_magnitude IN ('high', 'moderate', 'low', 'minimal', 'inconclusive')),
  confidence TEXT NOT NULL
    CHECK (confidence IN ('strong', 'moderate', 'suggestive')),

  -- Attribution
  attribution_confidence TEXT NOT NULL
    CHECK (attribution_confidence IN ('strong', 'moderate', 'low')),
  concurrent_experiment_ids UUID[] DEFAULT '{}',
  attribution_map JSONB,
  -- [{experiment_id, experiment_name, attribution_plausibility: "high"|"moderate"|"low"}]

  -- AI metadata
  ai_model TEXT,
  ai_prompt_version TEXT,

  created_at TIMESTAMPTZ DEFAULT now(),
  UNIQUE(experiment_id)
);

CREATE INDEX idx_outcomes_catalog ON experiment_outcomes(catalog_experiment_id);
CREATE INDEX idx_outcomes_category ON experiment_outcomes(experiment_category);
CREATE INDEX idx_outcomes_magnitude ON experiment_outcomes(overall_magnitude);
CREATE INDEX idx_outcomes_user ON experiment_outcomes(user_id);

`community_experiment_stats` — Wisdom of the Lab

CREATE TABLE community_experiment_stats (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  catalog_experiment_id UUID NOT NULL REFERENCES experiment_catalog(id) ON DELETE CASCADE,
  total_participants INTEGER DEFAULT 0,
  total_completed INTEGER DEFAULT 0,
  avg_impact_by_metric JSONB DEFAULT '{}',
  -- {metric_key: {avg_change_pct, median_change_pct, p25, p75}}
  pct_high_impact NUMERIC(5,2) DEFAULT 0,
  pct_moderate_impact NUMERIC(5,2) DEFAULT 0,
  pct_low_impact NUMERIC(5,2) DEFAULT 0,
  pct_minimal_impact NUMERIC(5,2) DEFAULT 0,
  baseline_segment_stats JSONB DEFAULT '{}',
  -- {rhr_60_70: {avg_change_pct: X, count: Y}, hrv_40_50: {...}}
  updated_at TIMESTAMPTZ DEFAULT now(),
  UNIQUE(catalog_experiment_id)
);

`user_discoveries` — Discovery Output

CREATE TABLE user_discoveries (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  user_id UUID NOT NULL REFERENCES auth.users(id) ON DELETE CASCADE,
  experiment_id UUID REFERENCES experiments(id) ON DELETE SET NULL,
  catalog_experiment_id UUID REFERENCES experiment_catalog(id),
  discovery_type TEXT NOT NULL
    CHECK (discovery_type IN ('experiment_result', 'unenrolled_pattern')),
  title TEXT NOT NULL,
  summary TEXT NOT NULL,
  detailed_analysis TEXT,
  metrics_impact JSONB NOT NULL,
  -- [{metric_key, metric_label, baseline_value, observed_value, change_pct, magnitude, unit}]
  overall_magnitude TEXT
    CHECK (overall_magnitude IN ('high', 'moderate', 'low', 'minimal', 'inconclusive')),
  confidence TEXT CHECK (confidence IN ('strong', 'moderate', 'suggestive')),
  confounders_noted TEXT[],
  suggested_experiment_id UUID REFERENCES experiment_catalog(id),
  ai_model TEXT,
  ai_prompt_version TEXT,
  status TEXT DEFAULT 'new'
    CHECK (status IN ('new', 'viewed', 'added_to_playbook', 'eliminated', 'dismissed')),
    -- 'eliminated' = user acknowledged a minimal/inconclusive result (Success of Elimination)
  created_at TIMESTAMPTZ DEFAULT now()
);

CREATE INDEX idx_user_discoveries_user ON user_discoveries(user_id, created_at DESC);
CREATE INDEX idx_user_discoveries_type ON user_discoveries(user_id, discovery_type);

`user_playbook` — Your Body's Operating Manual

CREATE TABLE user_playbook (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  user_id UUID NOT NULL REFERENCES auth.users(id) ON DELETE CASCADE,
  discovery_id UUID REFERENCES user_discoveries(id) ON DELETE SET NULL,
  catalog_experiment_id UUID REFERENCES experiment_catalog(id),
  habit_name TEXT NOT NULL,
  impact_category TEXT NOT NULL,  -- sleep, hrv, rhr, glucose, recovery, metabolic, functional
  magnitude TEXT NOT NULL CHECK (magnitude IN ('high', 'moderate', 'low', 'eliminated')),
  -- 'eliminated' = Minimal/Inconclusive result, framed as "ruled out" (Success of Elimination)
  impact_description TEXT NOT NULL,  -- "HRV +16%, Deep Sleep +22 min" or "Not a lever for your sleep"
  rank INTEGER,  -- 1 = highest impact lever; eliminated entries ranked last
  created_at TIMESTAMPTZ DEFAULT now(),
  updated_at TIMESTAMPTZ DEFAULT now()
);

CREATE INDEX idx_user_playbook_user ON user_playbook(user_id, rank);

`device_data_quality` — Wearable Sync Health Monitoring

CREATE TABLE device_data_quality (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  user_id UUID NOT NULL REFERENCES auth.users(id) ON DELETE CASCADE,
  device_id UUID NOT NULL REFERENCES connected_devices(device_id) ON DELETE CASCADE,
  assessment_date DATE NOT NULL,

  -- Completeness scoring
  data_quality_score NUMERIC(5,2) NOT NULL,  -- 0-100
  -- 100 = all expected metrics present, no gaps
  -- 80+ = minor gaps, usable for experiments
  -- 50-79 = significant gaps, experiments may be limited
  -- <50 = unreliable, warn user

  -- Gap analysis
  missing_data_days INTEGER DEFAULT 0,       -- days with no data in last 14 days
  partial_data_days INTEGER DEFAULT 0,       -- days with some but not all expected metrics
  total_days_assessed INTEGER NOT NULL,

  -- Per-metric availability
  metric_availability JSONB NOT NULL,
  -- {hrv: {available: true, days_with_data: 12, total_days: 14, quality: "good"},
  --  rhr: {available: true, days_with_data: 14, total_days: 14, quality: "excellent"},
  --  sleep_stages: {available: false, days_with_data: 0, total_days: 14, quality: "unavailable"}}

  -- Sync health
  sync_health TEXT NOT NULL CHECK (sync_health IN ('healthy', 'degraded', 'failing', 'stale')),
  -- healthy: synced within last 6 hours, <2 missing days in 14
  -- degraded: synced within 24h but 2-4 missing days
  -- failing: >4 missing days or sync errors
  -- stale: no sync in >48 hours

  last_successful_sync TIMESTAMPTZ,
  sync_error_count_7d INTEGER DEFAULT 0,

  created_at TIMESTAMPTZ DEFAULT now(),
  UNIQUE(device_id, assessment_date)
);

CREATE INDEX idx_data_quality_user ON device_data_quality(user_id, assessment_date DESC);
CREATE INDEX idx_data_quality_device ON device_data_quality(device_id, assessment_date DESC);

1.3 Modify Existing Tables

`experiments` — Add Catalog Reference + Data Quality + Attribution

ALTER TABLE experiments
  ADD COLUMN catalog_experiment_id UUID REFERENCES experiment_catalog(id),
  ADD COLUMN baseline_metrics JSONB,           -- auto-computed baseline snapshot
  ADD COLUMN baseline_quality TEXT,             -- 'excellent' | 'good' | 'limited' | 'insufficient'
  ADD COLUMN data_quality_at_enrollment JSONB,  -- snapshot of device_data_quality at enrollment
  ADD COLUMN concurrent_experiment_ids UUID[],  -- IDs of experiments that overlapped (populated at completion)
  ADD COLUMN attribution_confidence TEXT        -- 'strong' | 'moderate' | 'low' (computed at completion)
    CHECK (attribution_confidence IN ('strong', 'moderate', 'low')),
  ADD COLUMN is_custom BOOLEAN DEFAULT false;

`experiment_checkins` — Add Confounder Tracking + Auto-Detection

ALTER TABLE experiment_checkins
  ADD COLUMN confounders JSONB DEFAULT '{}',
  -- {"alcohol": true, "illness": false, "travel": false, "intense_workout": true, "poor_sleep": false}
  ADD COLUMN auto_detected BOOLEAN DEFAULT false,
  -- true if adherence was auto-detected from wearable data (not manual check-in)
  ADD COLUMN auto_detect_data JSONB;
  -- evidence for auto-detection: {"detected_bedtime": "22:15", "target_bedtime": "22:30", "within_threshold": true}

`experiment_results` — Add Magnitude Scoring

ALTER TABLE experiment_results
  ADD COLUMN overall_magnitude TEXT
    CHECK (overall_magnitude IN ('high', 'moderate', 'low', 'minimal', 'inconclusive')),
  ADD COLUMN ai_model TEXT,
  ADD COLUMN ai_prompt_version TEXT;

1.4 RLS Policies

-- experiment_catalog: public read for authenticated users
ALTER TABLE experiment_catalog ENABLE ROW LEVEL SECURITY;
CREATE POLICY catalog_select ON experiment_catalog FOR SELECT TO authenticated USING (true);
CREATE POLICY catalog_service ON experiment_catalog FOR ALL TO service_role USING (true) WITH CHECK (true);

-- experiment_outcomes: user can read own, service_role aggregates
ALTER TABLE experiment_outcomes ENABLE ROW LEVEL SECURITY;
CREATE POLICY outcomes_select ON experiment_outcomes FOR SELECT USING (auth.uid() = user_id);
CREATE POLICY outcomes_service ON experiment_outcomes FOR ALL TO service_role USING (true) WITH CHECK (true);

-- community_experiment_stats: public read for authenticated users
ALTER TABLE community_experiment_stats ENABLE ROW LEVEL SECURITY;
CREATE POLICY community_stats_select ON community_experiment_stats FOR SELECT TO authenticated USING (true);
CREATE POLICY community_stats_service ON community_experiment_stats FOR ALL TO service_role USING (true) WITH CHECK (true);

-- user_discoveries: user owns their discoveries
ALTER TABLE user_discoveries ENABLE ROW LEVEL SECURITY;
CREATE POLICY discoveries_select ON user_discoveries FOR SELECT USING (auth.uid() = user_id);
CREATE POLICY discoveries_insert ON user_discoveries FOR INSERT WITH CHECK (auth.uid() = user_id);
CREATE POLICY discoveries_update ON user_discoveries FOR UPDATE USING (auth.uid() = user_id);
CREATE POLICY discoveries_delete ON user_discoveries FOR DELETE USING (auth.uid() = user_id);
CREATE POLICY discoveries_service ON user_discoveries FOR ALL TO service_role USING (true) WITH CHECK (true);

-- user_playbook: user owns their playbook
ALTER TABLE user_playbook ENABLE ROW LEVEL SECURITY;
CREATE POLICY playbook_select ON user_playbook FOR SELECT USING (auth.uid() = user_id);
CREATE POLICY playbook_insert ON user_playbook FOR INSERT WITH CHECK (auth.uid() = user_id);
CREATE POLICY playbook_update ON user_playbook FOR UPDATE USING (auth.uid() = user_id);
CREATE POLICY playbook_delete ON user_playbook FOR DELETE USING (auth.uid() = user_id);
CREATE POLICY playbook_service ON user_playbook FOR ALL TO service_role USING (true) WITH CHECK (true);

-- device_data_quality: user reads own, service_role writes
ALTER TABLE device_data_quality ENABLE ROW LEVEL SECURITY;
CREATE POLICY data_quality_select ON device_data_quality FOR SELECT USING (auth.uid() = user_id);
CREATE POLICY data_quality_service ON device_data_quality FOR ALL TO service_role USING (true) WITH CHECK (true);

1.5 Seed Data — Experiment Catalog

v1: 4-8 High-Impact Experiments Only

For v1, we ship a focused catalog of the highest-signal experiments — the ones most likely to produce a measurable "wow" moment for new users. The full ~50 experiment library is a v2 expansion.

All descriptions MUST pass the Wellness Terminology Audit (Section 1.1) before commit.

v1 Catalog (8 experiments):

#	Experiment	Category	Duration	Adherence	Why v1
1	Alcohol Elimination	Sleep	14 days	manual	Highest probability of dramatic, measurable change
2	Early Bedtime	Sleep	14 days	auto	High signal, easy, auto-detectable via sleep timestamps
3	Post-Meal Walk	RHR / Sleep	14 days	auto	Low friction, auto-detectable, strong multi-metric signal
4	Caffeine Curfew	Sleep	14 days	manual	High signal for sleep metrics, relatable protocol
5	Consistent Wake Time	Sleep	14 days	auto	Easy, auto-detectable, strong sleep consistency signal
6	Magnesium Before Bed	RHR / Sleep	14 days	manual	Accessible supplement, strong RHR + sleep signal
7	Morning Sunlight	Sleep	10 days	semi_auto	Easy, well-known (Huberman audience), moderate signal
8	Digital Sunset	Sleep	14 days	manual	Moderate signal, highly relatable, no equipment needed

Why these 8: They target the metrics most users have (sleep, RHR, HRV), require no special equipment or CGM, have strong published evidence, and include a mix of auto/semi_auto/manual adherence. 5 of 8 target sleep — deliberately, because sleep is the metric with the most consistent measurable signal from wearables and is universally relevant.

v2 Catalog Expansion (~50 experiments):

Category	v2 Additions
Glucose	ACV, Food Sequencing, Cinnamon, Paired Carb Rule, Resistance Training
HRV	Resonant Breathing, Cold Exposure, Nasal Walking, Movement Snacks
RHR	Legs Up Wall, Zone 2, Hydration Load, Sauna
Metabolic	30g Protein Breakfast, 8PM Curfew, 3-Hour Buffer, TRE 10h, Mediterranean Trial, UPF Elimination, etc.
Body Composition	Protein Pacing, Weighted Vest Walking, Creatine
VO2 Max	Norwegian 4x4, Fasted Zone 2
Recovery	Sauna 3x/wk, Afternoon Nap
Behavior	Nature Exposure, No News
Exercise	Strength Training 3x/wk, HIIT 2x/wk
Functional	Dead Hang Challenge, Floor Sitting

Adherence detection classification:

Detection Type	v1 Experiments	How Detected
`auto`	Early Bedtime, Consistent Wake Time, Post-Meal Walk	Sleep timestamps, step counts, activity logs from wearable
`semi_auto`	Morning Sunlight	Activity/location detected, one-tap confirm
`manual`	Alcohol Elimination, Caffeine Curfew, Magnesium, Digital Sunset	Cannot be detected from wearable data

Auto-detect config examples:

// Early Bedtime: auto
{"type": "sleep_start_time", "target": "relative_to_baseline", "offset_minutes": -45, "threshold_minutes": 15}

// Post-Meal Walk: auto (via evening steps spike)
{"type": "activity_after_time", "window_start": "18:00", "window_end": "21:00", "min_duration_minutes": 10, "activity_types": ["walk"]}

// Consistent Wake Time: auto
{"type": "sleep_end_time_variance", "max_variance_minutes": 30}

Each catalog entry includes:

primary_metrics: The 2-3 metrics most likely to show impact
secondary_metrics: Additional metrics to track
required_data_sources: Which wearable data is needed
confounders: Known confounders to flag during check-ins
adherence_detection + auto_detect_config: How adherence is tracked
evidence_summary + evidence_url: Scientific backing (use wellness-compliant language)

1.6 AI Model Abstraction Layer

Edge Function: `ai-engine`

A new Supabase Edge Function that supports multiple AI providers with a single interface.

supabase/functions/ai-engine/
├── index.ts              -- Router: /analyze, /spot-patterns, /recommend, /starter-pack
├── providers/
│   ├── types.ts          -- Provider interface
│   ├── gemini.ts         -- Google Gemini (Gemini 2.0 Flash / Pro)
│   ├── openai.ts         -- OpenAI (GPT-4o-mini) — fallback
│   └── factory.ts        -- Provider selection based on config
├── engines/
│   ├── experiment-analyst.ts    -- Experiment analysis (replaces analyze-experiment)
│   ├── pattern-spotter.ts       -- Unenrolled discovery detection
│   ├── recommender.ts           -- Experiment recommendations
│   └── starter-pack.ts          -- Personalized first-experiment recommendation
├── prompts/
│   ├── shared-guidelines.ts     -- FDA compliance rules, wellness language
│   ├── experiment-analysis.ts   -- Analysis prompt template
│   ├── pattern-detection.ts     -- Pattern spotting prompt template
│   └── recommendation.ts        -- Recommendation prompt template
└── compliance/
    ├── banned-words.ts          -- Banned/required terms from Section 1.1
    └── output-validator.ts      -- Scan AI output for compliance violations

Provider Interface:

interface AIProvider {
  name: string;
  chat(params: {
    systemPrompt: string;
    userPrompt: string;
    temperature?: number;
    maxTokens?: number;
    responseFormat?: 'json';
  }): Promise<{ content: string; model: string; usage: { input: number; output: number } }>;
}

Provider Selection:

Environment variable AI_PROVIDER=gemini|openai (default: gemini)
Model-specific env vars: GEMINI_API_KEY, GEMINI_MODEL, OPENAI_API_KEY, OPENAI_CHAT_MODEL
Fallback chain: if primary provider fails, try secondary

Output Compliance Validation: Every AI response is passed through output-validator.ts before being stored or displayed:

Scan for banned terms from the dictionary
Verify required disclaimers are present
If violations found: auto-correct where possible, log violation, flag for review
This is a runtime safety net — the prompts should prevent violations, but validation catches edge cases

1.7 Metric Normalization — Extend Existing Registry

The existing METRIC_REGISTRY in mobile/src/utils/experiments/metrics.ts already handles vendor-specific extraction for Whoop, Oura, and Fitbit. Extend it for:

Apple Health — add 'apple_health' to requires arrays where applicable
Google Fit — add 'google_fit' to requires arrays
New metrics (if needed by catalog experiments):
- respiratory_rate (Oura, Whoop)
- sleep_latency (Oura, Whoop)
- spo2 (Oura, Fitbit, Apple Health)

1.8 Baseline Auto-Computation

Create a shared utility (used by both mobile and Edge Function):

interface BaselineResult {
  metric_key: string;
  period_start: string;
  period_end: string;
  mean: number;
  median: number;
  std_dev: number;
  min: number;
  max: number;
  typical_range: [number, number];  // mean ± 1 std_dev
  data_points: number;
  quality: 'excellent' | 'good' | 'limited' | 'insufficient';
  // excellent: 14+ days, low variance
  // good: 7-13 days
  // limited: 3-6 days
  // insufficient: <3 days
}

Logic:

Query connected_devices for user's active providers
Determine available metrics via getAvailableMetrics()
Look back up to 30 days for historical data
Compute stats per metric
Return baseline snapshot + quality assessment
If insufficient data for a metric, flag it but don't block enrollment

1.9 Data Quality Monitoring

Sync Health Assessment (runs after each sync cycle)

After each sync-all-devices or sync-cgm-devices run, assess data quality:

Per-device assessment: For each connected device, check:
- Days with data in last 14 days
- Which metrics are present vs expected for that provider
- Time since last successful sync
- Error count in last 7 days (from ingestion_log)
Score computation (0-100):
- 100: All expected metrics, no gaps, synced within 6 hours
- 80+: Minor gaps (1-2 days), usable for experiments
- 50-79: Significant gaps (3-5 days), experiments may be limited
- <50: Unreliable, warn user before enrollment
Upsert to device_data_quality table daily
User-facing indicators:
- Green badge on Profile > Devices: "Healthy" sync
- Amber badge: "Degraded" — missing recent data
- Red badge: "Needs attention" — failing sync or stale data
- Shown on experiment enrollment if quality is low

Impact on Experiments

At enrollment: snapshot device_data_quality into experiments.data_quality_at_enrollment
During experiment: if data quality drops below 50 for >3 consecutive days, notify user
At analysis: factor data quality into confidence scoring (more missing days = lower confidence)

Phase 2: Core Experiment Experience (UI + Flow)

2.1 Navigation Restructure

New Tab Bar:

Tab	Icon	Route	Purpose
Discover	Compass	`(tabs)/discover`	Experiment library, recommendations, unenrolled discoveries
My Lab	FlaskConical	`(tabs)/lab`	Active experiments, check-ins, progress
Playbook	BookOpen	`(tabs)/playbook`	Personal discoveries, impact rankings
Profile	User	`(tabs)/profile`	Settings, devices, account

Hidden but accessible routes:

(tabs)/home — Event logging (hidden from tab bar, accessible via Profile > "Event Logger")
(tabs)/history — Event history (same)
(tabs)/insights — Glucose insights (same)

2.2 Personalized Starter Pack & Metric Gap Analysis

New User Experience

When a user connects their first wearable and has 14+ days of historical data, the app immediately runs two processes:

Metric Gap Analysis — Score each metric against published wellness ranges
Unenrolled Discovery Scan — Find patterns in historical data (Section 3.1)

Metric Gap Analysis — "Where You Stand"

Wellness Reference Ranges (NOT clinical — derived from published wearable population data):

Metric	Optimal Range	Source
RHR	50-65 bpm	General fitness literature
HRV (RMSSD)	Age-adjusted: 20s: 50-100ms, 30s: 40-80ms, 40s: 30-60ms, 50+: 20-50ms	Population wearable data
Sleep Duration	7-9 hours	Sleep foundation guidelines
Deep Sleep %	15-25% of total sleep	Sleep stage research
REM Sleep %	20-25% of total sleep	Sleep stage research
Steps	8,000-12,000/day	Activity research

Gap Scoring Algorithm:

interface MetricGap {
  metric_key: string;
  user_value: number;
  optimal_low: number;
  optimal_high: number;
  gap_severity: 'within_optimal' | 'slightly_below' | 'below' | 'well_below';
  improvement_potential: number;  // 0-100, higher = more room to improve
}

Compute user's 14-day average for each available metric
Compare against age-adjusted wellness ranges
Score gap severity (how far below optimal)
Rank metrics by improvement potential

Starter Pack Selection

From the 8 starter pack candidates, select 4-8 based on:

Metric gap targeting (40% weight): Prioritize experiments that target the user's weakest metrics
Difficulty for first-timers (20% weight): Favor "easy" experiments
Community impact data (20% weight): Favor experiments with high community impact rates
Data measurability (20% weight): Only include experiments whose metrics the user can actually track

Presentation — "Your First Experiment" hero:

┌─────────────────────────────────────────┐
│  Based on your data, here's where       │
│  you have the most room to improve:     │
│                                         │
│  Resting Heart Rate: 68 bpm             │
│  ████████████░░░░░░  slightly above     │
│                 optimal (50-65 bpm)     │
│                                         │
│  Recommended first experiment:          │
│  ┌─────────────────────────────────┐   │
│  │  🚶 Post-Meal Walk              │   │
│  │  14 days · Easy · Auto-tracked  │   │
│  │                                  │   │
│  │  78% of participants with a     │   │
│  │  similar RHR saw a meaningful   │   │
│  │  reduction in resting heart     │   │
│  │  rate.                          │   │
│  │                                  │   │
│  │  [Start This Experiment]        │   │
│  └─────────────────────────────────┘   │
│                                         │
│  Other recommended experiments:         │
│  • Zone 2 Cardio (targets RHR)         │
│  • Earlier Bedtime (targets sleep)      │
│  • Caffeine Curfew (targets sleep)      │
└─────────────────────────────────────────┘

2.3 Discover Tab — Insight-First Design

The Discover tab is insight-first, not catalog-first. The most magical moment is: "We noticed something interesting in your data." That must appear before any experiment catalog. The product should feel like a system that understands your body, not a library of health hacks.

Main Screen (`discover.tsx`)

Layout (ordered by priority — insights first, catalog last):

Insights Hero (ALWAYS first — the magic moment):
- If unenrolled discoveries exist: Full-width card(s) showing AI-detected patterns
  - "We noticed something in your data..."
  - Pattern description with metric visualization
  - "Want to confirm this? Start a 7-day experiment →"
- If no discoveries yet but data is loading: "Analyzing your data... We're looking for patterns."
- If no wearable connected: "Connect a wearable to unlock your first discovery."
- If wearable connected but <14 days data: "We're collecting data. Your first insight is coming soon."
- This section is never empty — it always communicates what's happening
Your First Experiment (for new users with no active/completed experiments):
- Personalized recommendation from Metric Gap Analysis (Section 2.2)
- Shows the single best experiment with personalized hook
- "Based on your data, this experiment has the highest likelihood of impact for you."
Recommended for You (for returning users with experiment history):
- AI-powered recommendations based on data profile, past experiments, playbook
- Includes "Confirm the Driver" suggestions when attribution is ambiguous (Section 5.3)
- Horizontal scroll of experiment cards
Experiment Catalog:
- Full list of available experiments (4-8 in v1)
- Category badges, difficulty, adherence type
- Community data on each card

Experiment Card Component

Each card displays:

Experiment name + category badge
Duration (e.g., "14 days")
Difficulty badge (Easy / Moderate / Hard)
Adherence type indicator (Auto-tracked / One-tap / Daily check-in)
Primary metrics icons
Community data: "84% of 1,200 participants saw a 5%+ increase in HRV"
Data availability indicator (green check if user has required data, amber warning if not)

Experiment Detail Screen (`experiment/[slug].tsx`)

Sections:

Hero: Name, category, difficulty, duration, adherence type
Protocol: Full description of what to do
Goal: What we're testing
Why It Works: Science explanation (plain language, wellness-compliant)
Metrics Tracked: Primary + secondary with data availability check
Wisdom of the Lab (Community Data):
- "84% of 1,200 participants saw a 5%+ increase in HRV using this protocol."
- "Users with a similar baseline RHR to yours typically see a 'High' magnitude of impact."
Evidence: Link to study
Data Availability Warning (if applicable):
- "This experiment tracks HRV, which requires an Oura or Whoop device. You don't currently have one connected. You can still run this experiment, but impact measurement will be limited."
- [Start Anyway] [Connect a Device]
Start Experiment CTA

2.4 Enrollment Flow

Steps:

Baseline Preview: "Based on your last 14 days of data, here's your baseline:"
- Show computed baseline for each primary metric
- Quality indicators (excellent/good/limited/insufficient)
- Data quality score from device_data_quality
- If no historical data: "We'll collect baseline data for 7 days before your experiment starts"
Duration Selection: Default from catalog, user can adjust (min: min_duration_days)
Adherence Method Explanation:
- If auto: "We'll automatically track your adherence using your wearable data. No daily check-ins needed."
- If semi_auto: "We'll detect your activity and ask a quick confirmation question."
- If manual: "We'll ask you a simple yes/no question each day. Takes less than 5 seconds."
Concurrent Experiment Check:
- If user has active experiments, show them
- "Running multiple experiments simultaneously may make it harder to attribute changes to a specific experiment. We will account for this in the analysis."
Confirm & Start:
- Creates experiments row with catalog_experiment_id FK
- Copies primary_metrics + secondary_metrics to experiment_metrics
- Stores auto-computed baseline in baseline_metrics JSONB
- Snapshots device_data_quality into data_quality_at_enrollment
- Sets experiment_start = now (or baseline_end if baseline collection needed)

2.5 My Lab Tab

Main Screen (`lab.tsx`)

Layout:

Active Experiments section:
- Cards showing each active experiment with:
  - Progress bar (day X of Y)
  - Today's check-in prompt (only for manual adherence experiments, or when semi_auto needs confirmation)
  - Auto-detected adherence badge for auto experiments ("Adherence auto-detected today ✓")
  - Mid-experiment teaser ("Early signal: HRV trending 8% higher than baseline")
- Tap → Active experiment detail
Pending Baseline section (if any):
- Experiments waiting for baseline data collection
- Progress toward sufficient data
Recently Completed section:
- Experiments awaiting or showing analysis results
- "Discovery Found!" badge for experiments with results
Empty State: "Start your first experiment to begin discovering what works for your body."

Daily Check-in Flow — Adaptive by Adherence Type

auto experiments (Early Bedtime, Steps, Zone 2, etc.):

No manual check-in required. The app auto-detects adherence from wearable data.
After sync, the app checks auto_detect_config rules against the day's data.
Creates experiment_checkins row with auto_detected = true and auto_detect_data containing evidence.
User sees: "Day 8 of 14 — Adherence auto-detected ✓ (Bedtime: 10:22 PM, target: before 10:30 PM)"
If auto-detection can't determine adherence (e.g., missing data), fall back to manual prompt.
Confounder prompt still appears (briefly): "Anything unusual today?" → toggles for alcohol, illness, etc.

semi_auto experiments (Weighted Vest Walking, Morning Sunlight):

App detects the activity (e.g., a walk was logged).
Sends one-tap confirmation push notification: "We detected a 25-minute walk at 7:15 PM. Did you wear the weighted vest? [Yes] [No]"
Creates check-in with auto_detected = true + user confirmation.

manual experiments (Supplements, dietary changes, breathing):

Traditional check-in: "Did you follow the protocol today?" → [Yes] [Mostly] [No]
Confounder toggles
Optional note
Design principle: <10 seconds. This is not journaling.

Active Experiment Detail (`experiment/active/[id].tsx`)

Sections:

Progress: Day X of Y, adherence rate, progress timeline
Metric Trends: Small charts showing primary metrics over baseline + experiment period
Mid-Experiment Teasers: Hints about emerging patterns
- Only shown after day 5+ with sufficient data
- "Early signal detected: Your deep sleep is trending 15% higher than your baseline average"
Check-in History: Calendar view with adherence indicators (auto/manual/missed)
Data Quality Indicator: Current sync health for relevant devices
Actions: Pause, Extend, Complete Early, Abandon

2.6 Experiment Completion & Discovery

When an experiment ends (duration reached or user completes early):

Status updates to completed, experiment_end set
AI Analysis triggered via ai-engine/analyze:
- Fetches baseline vs experiment period data
- Computes statistical comparisons (existing logic)
- Gemini generates narrative with Magnitude of Impact framing
- Creates user_discoveries row
- Creates experiment_outcomes row (normalized for data learning pipeline)
- Updates experiment_results with magnitude scoring
- Generates playbook suggestion
- Generates "What's Next?" recommendations
- Runs compliance validation on AI output
Discovery Presentation Screen (discovery/[id].tsx):

┌─────────────────────────────────────┐
│         Discovery Found!            │
│                                     │
│    Post-Meal Walk                   │
│    14-day experiment                │
│                                     │
│  ┌─────────────────────────────┐   │
│  │ Magnitude of Impact: HIGH   │   │
│  └─────────────────────────────┘   │
│                                     │
│  Resting HR    -4 bpm  (62→58)     │
│  ████████████████████░░  -6.5%     │
│                                     │
│  Deep Sleep    +18 min (52→70)     │
│  ████████████████████░░  +34.6%    │
│                                     │
│  HRV           +8 ms  (44→52)     │
│  ████████████████░░░░░░  +18.2%    │
│                                     │
│  Confidence: Moderate               │
│  Attribution: Strong                │
│  12 valid days, 2 excluded          │
│  (1 alcohol, 1 illness)             │
│                                     │
│  "Walking after dinner was          │
│   associated with meaningful        │
│   improvements in your recovery     │
│   metrics. Your resting heart rate  │
│   and deep sleep showed the         │
│   strongest response."              │
│                                     │
│  [Add to Playbook]                  │
│                                     │
│  ─── What's Next? ───              │
│  Based on your results:             │
│  • Earlier Dinner (builds on this)  │
│  • Consistent Wake Time             │
│                                     │
│  For informational purposes only.   │
│  Not medical advice.                │
└─────────────────────────────────────┘

**When Attribution is Moderate or Low**, the discovery screen additionally shows:

┌─────────────────────────────────────┐ │ Attribution: Moderate │ │ │ │ Possible contributors: │ │ ├── Post-Meal Walk Moderate │ │ └── Magnesium Moderate │ │ │ │ ─── Confirm the Driver ─── │ │ Try pausing magnesium for 7 days │ │ while keeping the walk. │ │ [Start Isolation Experiment] │ └─────────────────────────────────────┘

Key framing rules:

NEVER "Success" / "Failure"
ALWAYS "Magnitude of Impact": High / Moderate / Low / Minimal / Inconclusive
Each metric shows: label, absolute change, baseline→observed, bar chart, percentage
Confounders are noted transparently
FDA disclaimer at bottom

Null Results — "The Success of Elimination"

Many experiments will produce Minimal or Inconclusive magnitude — effectively 0% impact. If the UX treats this as a letdown, the user feels they wasted 14 days. Instead, frame null results as a valuable discovery: you've eliminated a variable and narrowed the search.

Discovery screen when magnitude is Minimal/Inconclusive:

┌─────────────────────────────────────┐
│  Magnesium Before Bed               │
│  14 days • 12 valid days            │
│                                     │
│  Magnitude of Impact: Minimal       │
│                                     │
│  RHR           -0.3 bpm (61→60.7)  │
│  ░░░░░░░░░░░░░░░░░░░░░  -0.5%     │
│                                     │
│  Deep Sleep    +2 min (48→50)      │
│  ░░░░░░░░░░░░░░░░░░░░░  +4.2%     │
│                                     │
│  ─── Discovery ───                  │
│                                     │
│  ✓ You've eliminated a variable.    │
│                                     │
│  "Magnesium doesn't appear to be    │
│   a meaningful lever for your       │
│   sleep or recovery. That's a       │
│   valuable finding — you just       │
│   narrowed the search for what      │
│   actually works for your body."    │
│                                     │
│  💰 Estimated savings: ~$30/month   │
│                                     │
│  ─── What's Next? ───              │
│  These experiments target the same  │
│  metrics with higher community      │
│  impact rates:                      │
│  • Caffeine Curfew (72% saw impact) │
│  • Earlier Bedtime (68% saw impact) │
│                                     │
│  For informational purposes only.   │
│  Not medical advice.                │
└─────────────────────────────────────┘

Framing principles for null results:

Lead with affirmation: "You've eliminated a variable" — this IS progress
Reframe the value: "You just narrowed the search for what actually works for your body"
Show concrete savings (when applicable): supplement cost, time saved, effort redirected
Immediately pivot to what's next: Recommend experiments with higher community impact rates for the same metrics — the user's momentum should carry forward, not stall
Playbook entry: Null results are recorded in the playbook as "Eliminated" with a strikethrough-style badge, visually showing progress through the search space
AI narrative tone: Curious and encouraging, never apologetic. "Your body didn't respond to X" is a finding, not a failure

2.7 Playbook Tab — Progression System

The Playbook is not just a list — it's a progression system that gives users a clear reason to run more experiments. Each category has a discovery count that fills up, creating a sense of exploration and completeness.

Main Screen (`playbook.tsx`)

Layout:

Header: "Your Body's Operating Manual"

Category Progression Cards (the key engagement driver):

┌─────────────────────────────────┐
│  Sleep Playbook    2 / 5 ████░  │
│  Recovery Playbook 1 / 4 ██░░░  │
│  Metabolic Playbook 0 / 3 ░░░░  │
│  HRV Playbook      0 / 2 ░░░░  │
│  RHR Playbook      1 / 3 ██░░░  │
└─────────────────────────────────┘

Each category maps to experiment categories in the catalog
Denominator = number of experiments available in that category (from catalog)
Numerator = number of completed experiments with discoveries in that category
Tap a category → see discoveries for that category + available experiments to fill gaps
Categories with 0 discoveries show: "Run your first [category] experiment →"

Top Health Levers (ranked by magnitude):
- Ranked list of all discovered health levers across categories
- Each entry: rank, habit name, magnitude badge, impact summary, category icon
- Example: "#1 — Post-Dinner Walk | HIGH | RHR -6.5%, Deep Sleep +34.6%"
Eliminated Variables section:
- Experiments that produced Minimal/Inconclusive magnitude
- Displayed with ~~strikethrough~~ style and "Eliminated" badge
- Shows what was ruled out: "Magnesium — not a lever for your sleep"
- Reinforces progress: "3 eliminated, 2 confirmed — your search is narrowing"
- These count toward category progression (denominator explored, not just successes)
Unconfirmed Patterns section:
- Patterns spotted by the Unenrolled Discovery Engine but not yet confirmed via formal experiment
- "Unconfirmed" badge + "Confirm with an experiment →" CTA
Empty State: "Your body has stories to tell. Run your first experiment to start building your playbook."

Progression Logic

The category/denominator counts are derived from the experiment catalog:

v1 (8 experiments): Sleep: 5, RHR: 2, Sleep/RHR overlap: 1 → adjust to avoid double-counting
As catalog expands in v2, denominators grow — users always have more to explore
Both confirmed levers AND eliminated variables count toward progression — running an experiment always moves you forward
Numerator display: "3 explored (2 confirmed, 1 eliminated)" to show both types of progress
When a user completes all experiments in a category: "Category Complete! You've mapped your [category] levers."

Phase 3: Intelligence Layer (AI-Powered Features)

3.1 Unenrolled Discovery Engine (Pattern Spotting)

This is in v1 and is built in Sprint 2. The AI engine analyzes historical wearable data to find "accidental experiments" — patterns the user didn't intentionally create. This is our competitive advantage: users see a discovery before they even pick an experiment.

How It Works

Trigger: Runs when:
- User first connects a wearable with 14+ days of history (immediate value)
- Weekly cron job for users with active data
- On-demand when user visits Discover tab (if last scan >7 days ago)
Data Collection: Edge Function ai-engine/spot-patterns gathers:
- Last 30-90 days of daily_summary, sleep_sessions, glucose_data, activities
- Looks for natural variation in behaviors (walking frequency, sleep timing, activity patterns)
Statistical Pre-Filtering (BEFORE AI):

The AI should only see patterns that meet strict statistical thresholds. This prevents hallucinated correlations.

Minimum requirements to surface a pattern:
- 20+ data points in each comparison group (e.g., 20 days with the behavior, 20 without)
- Effect size >10% difference between groups
- Consistency across weeks: The pattern must hold across at least 3 separate weeks (not a one-time cluster)
- Statistical significance: p-value < 0.05 using Mann-Whitney U test (non-parametric, handles non-normal wearable data)
- Not explainable by day-of-week effects: Control for weekend vs weekday patterns
Pre-filter pipeline:
```
Raw data → Behavioral segmentation → Statistical comparison → Filter by thresholds → AI narrative generation
```
The statistical engine (not AI) identifies candidate patterns. The AI only generates the user-facing narrative for patterns that pass all filters.
Pattern Detection Categories:
- Activity → Recovery: "Days with 8,000+ steps correlate with 15% higher next-night HRV"
- Sleep timing → Sleep quality: "Nights with bedtime before 10:30 PM show 22 min more deep sleep"
- Exercise frequency → RHR: "Weeks with 3+ workouts show 5 bpm lower average RHR"
- Temporal patterns: "Your HRV has been trending upward over the last 3 weeks"
AI Narrative Generation (Gemini):
- Only runs on statistically validated patterns
- Generates user-friendly description using wellness-compliant language
- Maps pattern to a catalog experiment that could confirm it
- Must use correlational language only (FDA compliance)
Output: Creates user_discoveries rows with:
- discovery_type: 'unenrolled_pattern'
- suggested_experiment_id: links to catalog experiment that could confirm the pattern
- Title: "We noticed that on days you walk 8,000+ steps, your overnight HRV is 15% higher."
- CTA: "Want to turn this into a formal 7-day experiment to confirm it?"
Conversion Flow: User taps "Start Experiment" on an unenrolled discovery →
- Pre-fills enrollment with the suggested catalog experiment
- Notes the discovery that inspired it

Anti-Spam Rules

Max 3 unenrolled discoveries surfaced at a time
Don't resurface dismissed discoveries
Only patterns meeting ALL statistical thresholds (20+ points, >10%, multi-week consistency)
Don't surface patterns that contradict existing playbook entries
Rate limit: max 2 new discoveries per week per user

3.2 Experiment Recommender

Runs after each completed experiment and periodically.

Inputs:

User's completed experiments + results (from experiment_outcomes)
Current playbook entries
Available metrics (connected devices)
Current active experiments
Catalog of available experiments
User baseline profile (from most recent experiment_outcomes)

Logic:

Complementary experiments: If earlier bedtime showed high impact on sleep, recommend Post-Dinner Walk or Caffeine Curfew
Unexplored categories: If user has only done sleep experiments, suggest HRV or glucose experiments
High-signal experiments: Prioritize experiments with high community impact rates
Device-aware: Only recommend experiments the user can actually measure
Personalized: "Users with a similar baseline RHR to yours typically see a 'High' magnitude of impact from this experiment."

3.3 Data Learning Pipeline

Every completed experiment feeds a pipeline that makes the system smarter over time.

Pipeline Steps

1. Experiment completes
   └─→ 2. Normalized outcome record created (experiment_outcomes table)
        └─→ 3. Community stats aggregation triggered
             └─→ 4. Cohort-level effect sizes recomputed
                  └─→ 5. Recommendation engine weights updated
                       └─→ 6. Starter pack priorities recalculated

Step Details

Step 1-2: Outcome Normalization When an experiment completes, the analysis engine creates an experiment_outcomes row:

User baseline profile is bucketed (age range, metric ranges) for anonymous aggregation
All metric changes are stored with effect sizes
Adherence, confounders, and concurrent experiments are captured
This is the atomic unit of the learning pipeline

Step 3: Community Stats Aggregation Runs as a batch job (daily cron or triggered on outcome creation):

-- Example aggregation query
SELECT
  catalog_experiment_id,
  COUNT(*) as total_completed,
  AVG((metric_changes->0->>'change_pct')::numeric) as avg_primary_metric_change,
  PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY (metric_changes->0->>'change_pct')::numeric) as median_change,
  COUNT(*) FILTER (WHERE overall_magnitude = 'high') * 100.0 / COUNT(*) as pct_high_impact
FROM experiment_outcomes
WHERE confidence IN ('strong', 'moderate')
GROUP BY catalog_experiment_id;

Step 4: Cohort Effect Estimation Group outcomes by user baseline profile buckets:

"Users with RHR 60-70 who ran Post-Meal Walk" → average effect
"Users with HRV 30-40 who ran Alcohol Elimination" → average effect
Stored in community_experiment_stats.baseline_segment_stats

Step 5: Recommendation Engine Update The recommender uses cohort-level data to personalize:

Match user's current baseline to closest cohort
Weight recommendations by that cohort's historical outcomes
v1: Simple heuristic matching; v2+: ML-based collaborative filtering

Step 6: Starter Pack Recalculation As community data grows, starter pack priorities may shift:

If Post-Meal Walk shows consistently higher impact than Alcohol Elimination for RHR-focused users, reorder
Initially manual review; later automated with guardrails

Bootstrap Strategy (Pre-Community Data)

Until we have sufficient real user data (target: 100+ completed outcomes per experiment):

Seed community_experiment_stats with estimates from published research
Mark seeded data with source: 'research_estimate' in the JSONB
Blend: as real data accumulates, weight shifts from research estimates to actual outcomes
Transition threshold: when 50+ real outcomes exist for an experiment, deprecate research estimate

3.4 Community Data Pipeline

Aggregation Job (runs daily via cron or on experiment completion):

Query experiment_outcomes grouped by catalog_experiment_id
For each catalog experiment:
- Count total participants, total completed
- Compute average change_pct per metric across all users
- Compute magnitude distribution (% high, moderate, low, minimal)
- Segment by baseline ranges (e.g., users with baseline HRV 30-40 vs 40-50 vs 50+)
Update community_experiment_stats table
All data is anonymized — no user IDs in the aggregated output

Display on Experiment Cards:

"84% of 1,200 participants saw a 5%+ increase in HRV using this protocol."
"Users with a similar baseline RHR to yours typically see a 'High' magnitude of impact."

3.5 Mid-Experiment Teasers

After day 5 of an active experiment:

Compare experiment-period-so-far metrics against baseline
If a primary metric is trending >5% different from baseline, surface a teaser
"Early signal detected: Your deep sleep is trending 15% higher than your baseline average"
Updates daily
Uses simple statistical comparison, not full AI analysis (save that for completion)

Phase 4: Migration & Polish

4.1 Navigation Migration

Rename current (tabs)/experiments.tsx → incorporate into new (tabs)/lab.tsx
Create new (tabs)/discover.tsx and (tabs)/playbook.tsx
Modify (tabs)/_layout.tsx:
- New tab order: Discover, My Lab, Playbook, Profile
- Hide: Home, History, Insights (remove from tab bar but keep route files)
Add "Event Logger" and "Glucose Insights" links in Profile for backward access

4.2 Existing Experiment Migration

Users with existing custom experiments (from old create-experiment flow):

Keep them in experiments table with is_custom = true, catalog_experiment_id = null
Display in My Lab under "Custom Experiments" section
Can still be completed and analyzed
Remove create-experiment screen from primary navigation

4.3 Update `analyze-experiment` Edge Function

Either extend existing or redirect to new ai-engine/analyze:

Add overall_magnitude computation
Switch AI provider to Gemini (with OpenAI fallback)
Add concurrent experiment awareness
Generate user_discoveries row on completion
Generate experiment_outcomes row (normalized for data learning pipeline)
Generate playbook suggestion
Use Magnitude of Impact framing in all prompts
Run compliance validation on all AI output

4.4 Apple Health + Google Fit — Deferred to v2

Apple Health and Google Fit integrations are moved to v2. For v1, users connect Whoop, Fitbit, Oura, Libre, or Dexcom (existing sync infrastructure). See v2 Roadmap section for details.

Phase 5: Privacy, Consent & Data Governance

5.1 Community Data Opt-Out

Users must be able to opt out of having their anonymized experiment outcomes included in community aggregations.

Implementation:

Add community_data_opt_in BOOLEAN DEFAULT true to user_profiles
During onboarding (or in Profile > Privacy settings), explain:

"Your experiment results help the community by contributing to anonymous statistics like '78% of participants saw improvement.' No personal data is ever shared — only anonymized, aggregated numbers. You can opt out at any time."
If opted out:
- Their experiment_outcomes rows are excluded from community aggregation queries
- They can still see community stats (they just don't contribute)
- Opt-out is retroactive: existing outcomes are excluded from next aggregation run

5.2 Research Usage Consent

If we ever plan to use the dataset for published research or share with partners:

Implementation:

Add research_consent BOOLEAN DEFAULT false to user_profiles
Separate, explicit consent screen (not bundled with community opt-in):

"Would you like to contribute to health research? If you consent, your fully anonymized experiment data may be used in aggregate research studies. Your identity is never associated with research data. You can withdraw consent at any time."
Consent must be affirmative (opt-in, not opt-out)
Consent timestamp and version tracked: research_consent_at, research_consent_version

5.3 Data Retention Policy

Define and display clear data retention rules:

Data Type	Retention Period	Rationale
Raw wearable data (daily_summary, sleep_sessions, etc.)	Indefinite (user-controlled)	Users need historical data for baselines and pattern detection
Experiment records	Indefinite (user-controlled)	Users need their experiment history
Experiment outcomes (anonymized)	Indefinite	Core to community intelligence
Check-in data	Indefinite (user-controlled)	Part of experiment record
AI analysis outputs	Indefinite (user-controlled)	Part of discovery/playbook
Device tokens (OAuth)	Until device disconnected or user deletes account	Required for sync
Account deletion	Full deletion within 30 days of request	GDPR/CCPA compliance

Account deletion must:

Delete all PII (user_profiles, experiment records, discoveries, playbook)
Remove user from all community aggregations (re-aggregate without their data)
Revoke all OAuth tokens
Delete push tokens
Provide confirmation

5.4 Anonymization Policy

How experiment outcomes are anonymized for community use:

No PII in aggregated data: community_experiment_stats contains only counts, averages, and percentiles — no user IDs, no individual records
Baseline profiles are bucketed: Age ranges (20-29, 30-39, etc.), metric ranges (RHR 50-60, 60-70), never exact values
Minimum aggregation threshold: Community stats only shown when 10+ completed outcomes exist for an experiment (prevents small-group identification)
No temporal correlation: Aggregated stats are not timestamped to individual users
Differential privacy (future): For very small cohorts, consider adding noise to aggregated values

5.5 User-Facing Privacy Documentation

Create an in-app "Data & Privacy" section (accessible from Profile):

How Your Data Is Used: Plain-language explanation of data flow
Community Data: Explanation of anonymization + opt-out toggle
Research Consent: Separate consent flow
Data Retention: What we keep and for how long
Delete My Data: Account deletion request flow
Export My Data: Download all personal data (GDPR right of portability)

Technical Architecture Decisions

AI Model Strategy

Primary: Google Gemini 2.0 Flash (fast, cost-effective for most analysis) Upgrade: Gemini 2.0 Pro (for complex pattern detection, recommendations) Fallback: OpenAI GPT-4o-mini (existing infrastructure, proven reliability)

Why Gemini first:

Competitive pricing for high-volume analysis
Strong structured output support
Good at pattern detection in numerical data
Swappable via provider abstraction if performance doesn't meet needs

Environment Variables:

AI_PROVIDER=gemini
GEMINI_API_KEY=<key>
GEMINI_FLASH_MODEL=gemini-2.0-flash
GEMINI_PRO_MODEL=gemini-2.0-pro
# Fallback
OPENAI_API_KEY=<existing>
OPENAI_CHAT_MODEL=gpt-4o-mini

Concurrent Experiment Handling & Attribution Confidence

Users can run multiple experiments simultaneously. Since this makes causality ambiguous, we use an Attribution Confidence Model that is honest about uncertainty and converts ambiguity into follow-up experiment opportunities.

Attribution Confidence Model

Instead of trying to determine causality, classify how confident the attribution is:

Situation	Attribution Confidence	Label
1 experiment active during period	Strong	"This experiment was the primary variable during this period."
2 experiments active	Moderate	"Multiple experiments were active. Improvements may be associated with more than one habit."
3+ experiments active	Low	"Several experiments were active simultaneously. Individual attribution is uncertain."

Attribution confidence is surfaced on every discovery:

Post-Meal Walk Experiment
Magnitude of Impact: High
Attribution Confidence: Moderate

Multiple experiments were active during this period.
Improvements may be associated with more than one habit.

Attribution Map

When attribution confidence is Moderate or Low, show an Attribution Map — all experiments that were active during the period, ranked by plausibility:

Your recovery improved during this experiment period.

Possible contributors:
├── Post-Dinner Walk       Confidence: Moderate
├── Magnesium Before Bed   Confidence: Moderate
└── Earlier Bedtime        Confidence: Low (started mid-period)

Plausibility ranking factors:

Temporal overlap: Experiments active for the full period rank higher than those that started mid-way
Protocol relevance: Experiments whose primary metrics match the improved metrics rank higher
Adherence: Higher adherence = higher attribution plausibility

"Confirm the Driver" Follow-Up Experiments

When attribution is ambiguous, the system converts uncertainty into the next experiment opportunity:

Your sleep improved during the last 14 days, but multiple habits changed.

Suggested next experiment:
🔬 Confirm the driver
Try pausing magnesium for 7 days while keeping everything else constant.
If your sleep stays improved, the Post-Meal Walk was likely the primary driver.

This creates a natural experiment chain:

Run multiple experiments → see improvement → ambiguous attribution
System suggests isolation experiment → user runs it
Clear attribution → discovery confirmed with strong confidence

Implementation:

After analysis with Moderate/Low attribution, the AI generates a "confirm the driver" suggestion
Suggestion stored as a special recommendation type in user_discoveries
If user accepts, creates a new experiment that is a modified version (e.g., "Magnesium Pause" = keep everything else, remove one variable)
The follow-up experiment references the parent discovery for context

Technical Details

Each experiment maintains its own baseline (computed at enrollment time)
AI analysis prompt includes awareness of ALL concurrent experiments with their protocols
experiment_outcomes records all concurrent experiment IDs for pipeline analysis
Add to experiments table: concurrent_experiment_ids UUID[] — populated at completion time with IDs of all experiments that overlapped

Confounder Detection Strategy

At check-in (user-reported):

Alcohol, illness, travel, intense workout, poor sleep, significant stress

Automated (from wearable data):

Sleep duration outlier (< 4 hours)
Unusual activity level (>2 std dev from baseline)
New supplement/medication (from health events, if logged)

Excluded days: Days with reported confounders are flagged and optionally excluded from analysis. AI is told about excluded days and why.

Valid Day Computation

For experiment analysis, a "valid day" must:

Have check-in data (adherence = yes or mostly, or auto-detected = true)
Not be flagged with major confounders (illness, travel)
Have metric data available from wearable
Minimum 60% valid days required for analysis; otherwise confidence = 'suggestive'

FDA Compliance Architecture

Language Rules (Enforced in AI Prompts)

See Section 1.1 for the complete Banned Words / Required Phrases dictionary.

Enforced via:

Wellness Terminology Audit during Sprint 1 (before any content is written)
validateWellnessCompliance() function used in:
- Catalog seed data CI validation
- AI output post-processing (runtime)
- All user-facing text review
System prompt in all AI calls (shared-guidelines.ts)
Output validation — scan AI responses for banned terms before displaying
App-wide disclaimer: "For general wellness purposes only. Not intended to diagnose, treat, cure, or prevent any disease. Consult a healthcare professional before making health decisions."

Supplement/Medication Experiments — Extra Care

Experiments involving supplements (Magnesium, Creatine, ACV, Cinnamon):

Frame as "lifestyle experiments" not "therapeutic interventions"
Never claim dosing recommendations — use "amount" or "serving"
Include: "Consult your healthcare provider before starting any supplement"
Focus results on wearable metrics, not clinical outcomes

Implementation Order

Sprint 1: Foundation — Database + Compliance + Catalog (1-2 weeks) ✅ COMPLETE

Sprint 2: AI Engine + Pattern Spotting (1-2 weeks) ✅ COMPLETE

Sprint 3: Discover Tab (Insight-First) + Catalog UI (1-2 weeks) ✅ COMPLETE

New (tabs)/discover.tsx — insight-first layout (discoveries before catalog)
- mobile/src/app/(tabs)/discover.tsx — Insights Hero + Personalized/Static Starter Pack + Full Catalog sections
Insights Hero section (unenrolled discoveries, loading states, empty states)
- Placeholder states: "Connect a Wearable" / "Analyzing Your Data"
Experiment card component with community data + adherence type indicator
- mobile/src/components/Experiments/CatalogExperimentCard.tsx
- Shows: name, category, difficulty, duration, adherence type, primary metrics, data availability
- Community data display pending Sprint 2 aggregation pipeline
Experiment detail screen (catalog-experiment/[slug].tsx)
- mobile/src/app/catalog-experiment/[slug].tsx — protocol, goal, why it works, metrics, evidence, confounders
Metric availability detection per experiment
- Checks user's connected devices against required_data_sources
Data availability warnings + data quality indicators
- Warning card with missing source count + "Connect Device" CTA
Personalized Starter Pack for new users (powered by Sprint 2 starter-pack engine)
- mobile/src/hooks/usePersonalizedStarterPack.ts — React Query hook calling AI engine /starter-pack
- Discover tab shows "Recommended For You" with hero experiment, personalized reasons, metric gap summary
- Falls back to static "Start Here" section when AI is unavailable or no wearable connected
- mobile/__tests__/hooks/usePersonalizedStarterPack.test.ts (10 tests)
- mobile/__tests__/components/DiscoverPersonalized.test.tsx (10 tests)
Total: 20 new tests for Sprint 3 personalization. Full suite: 2095 tests, 0 regressions.

Sprint 4: Enrollment + My Lab (1-2 weeks) ✅ COMPLETE

Sprint 5: Discovery + Playbook (1 week) ✅ COMPLETE

Discovery presentation screen (Magnitude of Impact + Attribution Confidence)
- mobile/src/app/discovery/[id].tsx — full discovery screen with magnitude badge, metric cards, AI summary
- mobile/src/components/Experiments/MagnitudeBadge.tsx — colored badge per magnitude level
- mobile/src/components/Experiments/DiscoveryMetricCard.tsx — metric label + absolute change + baseline→observed
Attribution Map display for moderate/low confidence discoveries
- mobile/src/components/Experiments/AttributionMapCard.tsx — tree-style concurrent experiments + plausibility
- mobile/src/utils/experiments/discoveryPresentation.ts — shouldShowAttributionMap(), formatAttributionConfidence()
"Confirm the Driver" follow-up suggestion on discovery screen
- mobile/src/components/Experiments/ConfirmDriverCard.tsx — suggestion + isolation experiment CTA
"Add to Playbook" flow
- mobile/src/utils/experiments/addToPlaybook.ts — buildPlaybookInsert(), determinePlaybookMagnitude(), computeNextRank()
- mobile/src/hooks/useAddToPlaybook.ts — mutation hook with cache invalidation
(tabs)/playbook.tsx — progression system (category progress bars + ranked health levers)
- mobile/src/utils/experiments/playbookProgression.ts — computeCategoryProgression(), rankHealthLevers(), classifyPlaybookEntries()
- mobile/src/hooks/usePlaybook.ts — React Query hook computing progression, ranking, classification
- mobile/src/components/Experiments/PlaybookCategoryCard.tsx — category progress bar + summary
- mobile/src/components/Experiments/PlaybookEntryRow.tsx — ranked lever with magnitude badge
- mobile/src/components/Experiments/EliminatedVariableRow.tsx — strikethrough + "Eliminated" badge
"What's Next?" recommendations on discovery screen
- mobile/src/utils/experiments/whatsNextRecommendation.ts — selectWhatsNextExperiments() scoring engine
- mobile/src/hooks/useWhatsNext.ts — React Query hook fetching catalog + community stats
- mobile/src/components/Experiments/WhatsNextCard.tsx — recommendation cards with community impact %
Playbook empty state
Discovery + Playbook hooks: useDiscovery, usePlaybook, useAddToPlaybook, useWhatsNext
Null result framing: isNullResult(), getNullResultFraming() for minimal/inconclusive outcomes

Total: 73 new pure function tests for Sprint 5. Full suite: 2302 tests, 0 regressions.

Sprint 6: Navigation Migration + Data Pipeline (1 week) ✅ COMPLETE

Total: 60 new tests for Sprint 6. Full suite: 2396 tests, 0 regressions.

Sprint 7: Privacy + Testing + Polish (1-2 weeks) ✅ COMPLETE

Phase 1 — Pure Functions (TDD):

Privacy types + validation (mobile/src/utils/privacy/types.ts, privacyValidation.ts) — deletion request validation, retention days validation, consent change detection, account deletion summary, community exclusion logic (21 tests)
Compliance text constants (complianceText.ts) — FDA disclaimers, medical disclaimers, experiment disclaimers, AI disclaimers, community data disclaimers, data deletion warnings; context-based disclaimer selector (15 tests)
Community opt-out filtering (communityOptOut.ts) — filters outcomes by opted-out user IDs, computes opt-out impact on data sufficiency (9 tests)
Attribution model validation — 18-test validation suite for magnitudeScoring.ts with simulated concurrent experiment scenarios (0-3+ concurrent, overlap/adherence/metric relevance scoring, deterministic ordering, magnitude independence)
Performance benchmarks — 10 benchmark tests ensuring core functions scale linearly (computeOverallMagnitude, computeAttributionMap, computeTeaserInsights, validateEnrollment, filterOutcomes at 10k scale, selectStarterPack at 50 entries, computeBaselineFromValues at 1k points)
AI response schema validation — 9 contract tests validating ExperimentAnalysisResult, StarterPackResult, RecommendedExperiment shapes match mobile client expectations

Phase 2 — Database Migration:

supabase/migrations/20260313000000_privacy_consent_and_account_deletion.sql
- user_privacy_settings table (community_data_opt_in, research_consent, data_retention_days with CHECK >= 30)
- consent_audit_log table (immutable audit trail with consent_version, consent_type)
- account_deletion_requests table (pending → processing → completed lifecycle)
- community_data_opt_in column on experiment_outcomes
- delete_account(p_user_id) RPC — SECURITY DEFINER, cascades through all user tables, deletes auth.users row
- RLS policies and indexes for all new tables

Phase 3 — Hooks:

usePrivacySettings hook — React Query fetch + upsert + consent audit logging
useAccountDeletion hook — validates deletion request, calls delete_account RPC, clears SecureStore + Zustand auth state

Phase 4 — UI Screens + Components:

MedicalDisclaimer component — context-aware disclaimer text (experiment_result, teaser, discovery, ai_recommendation, community_stats), compact mode
Privacy & Data screen (mobile/src/app/privacy.tsx) — community data toggle, research consent toggle, data retention picker (30/60/90/180/365/Indefinite), delete account button, footer links
Account Deletion screen (mobile/src/app/delete-account.tsx) — multi-step flow: Summary → Reason (optional) → Type "DELETE" confirmation → Processing → Done
Profile screen updates — wired "Privacy & Security" and "Account Settings" to /privacy, added MedicalDisclaimer footer
Disclaimer additions — MedicalDisclaimer added to Discovery detail, Playbook tab, Discover tab

Phase 5 — Integration Tests:

Experiment lifecycle E2E (mobile/__tests__/integration/experiment/lifecycle.test.ts) — 8 tests: catalog creation, enrollment, check-ins, completion with outcome + discovery, add to playbook, cancellation, concurrent attribution, low adherence (requires local Supabase)
Privacy integration tests (mobile/__tests__/integration/privacy/account-deletion.test.ts) — 5 tests: privacy settings CRUD, retention constraint enforcement, consent audit logging, account deletion cascade, community opt-out flag on outcomes (requires local Supabase)

Total: 94 new tests for Sprint 7 (82 unit + 12 integration). Full suite: 2396 tests passing, 0 regressions.

v2 Roadmap

v2.1: Expanded Catalog + Wearable Integrations

Expand experiment catalog from 8 to ~50 experiments (full CSV)
Apple Health integration (read-only: HRV, RHR, sleep, steps, workouts, SpO2)
Google Fit integration (REST API, OAuth flow, sync function)
New experiment categories: Glucose, Metabolic, Body Composition, VO2 Max, Functional, Exercise
Category-specific wellness ranges for Metric Gap Analysis

v2.2: Shareable Discovery Cards (Virality Engine)

Generate beautiful, shareable images from experiment results and playbook entries.

Experiment Discovery Card:

┌─────────────────────────────────────┐
│         MY BODY EXPERIMENT          │
│                                     │
│      Alcohol Elimination            │
│           10 days                   │
│                                     │
│  Deep Sleep:      +31%             │
│  HRV:             +24%             │
│  Resting HR:      -5 bpm           │
│                                     │
│  Magnitude of Impact: HIGH          │
│                                     │
│  Your body is a lab.                │
│  Start the discovery.              │
│             Health Decoder          │
└─────────────────────────────────────┘

Top Health Levers Card:

┌─────────────────────────────────────┐
│   MY BODY'S TOP HEALTH LEVERS      │
│                                     │
│  1️⃣  Earlier bedtime               │
│     HRV +22%                        │
│                                     │
│  2️⃣  No alcohol                    │
│     Deep sleep +31%                 │
│                                     │
│  3️⃣  Evening walk                  │
│     Resting HR −4 bpm               │
│                                     │
│  Your body is a lab.                │
│  Start the discovery.              │
│             Health Decoder          │
└─────────────────────────────────────┘

Implementation:

Generate card as a rendered React Native view → export to image via react-native-view-shot
Share via native share sheet (iOS/Android)
Include app download link / deep link
Available from: Discovery result screen, Playbook screen
Watermarked with "Health Decoder" branding + tagline

v2.3: Smart Notifications (Experiment-Aware Push)

Design principle: every notification must feel like help, not spam. No generic reminders.

Notification Types:

Trigger	Notification	Value
Daily check-in due (manual experiments only)	"Quick check-in: Did you follow the Caffeine Curfew protocol today? [Yes] [Mostly] [No]"	Actionable one-tap
Semi-auto activity detected	"We detected a 25-min walk at 7:15 PM. Did you wear the weighted vest? [Yes] [No]"	One-tap confirmation
Mid-experiment teaser (day 5+)	"Early signal: Your deep sleep is trending 18% higher than baseline."	Motivation
Experiment completion	"Your Alcohol Elimination experiment is complete! Tap to see your discovery."	Excitement
Unenrolled discovery found	"We noticed a pattern in your data. Tap to see what we found."	Magic moment
Data quality degraded	"Your Oura hasn't synced in 48 hours. This may affect your active experiment."	Helpful warning
Playbook milestone	"You've completed 3 Sleep experiments! Your Sleep Playbook is 60% complete."	Progression

Anti-Annoyance Rules:

Max 2 notifications per day
Never send between 10 PM and 7 AM (respect sleep experiments!)
Group related notifications
User can mute per-experiment or globally
If user ignores 3 consecutive notifications, reduce frequency

v2.4: Automated Verification (Wearable Activity Detection)

Extend auto-adherence beyond simple threshold checks:

Detect specific activity types from wearable data (walk, run, strength training)
Cross-reference with experiment protocols
For semi_auto experiments, detect the activity and prompt one-tap confirmation
For auto experiments, silently verify and mark adherence

v2.5: Subjective Wellness Check-ins

Allow users to report how they're subjectively feeling throughout the day. This data becomes a first-class metric in experiment analysis, complementing objective wearable data.

UX Design

Prompt: A small, non-intrusive floating card that appears at configurable times:

┌─────────────────────────────┐
│  How are you feeling?       │
│                             │
│  Energy                     │
│  😴  😐  🙂  😊  🔥       │
│                             │
│  Mood                       │
│  😞  😐  🙂  😊  😄       │
│                             │
│  Focus                      │
│  🌫️  😐  🙂  😊  🎯       │
│                             │
│  Physical Comfort           │
│  😣  😐  🙂  😊  💪       │
│                             │
│  [Skip]          [Save]     │
└─────────────────────────────┘

Key Design Decisions:

4 dimensions: Energy, Mood, Focus, Physical Comfort
5-point scale per dimension (1-5, displayed as emoji faces for instant comprehension)
One-tap per dimension: Tap the emoji, done. Entire check-in <5 seconds.
3 prompts per day at configurable times:
- Morning (default: 30 min after wake time detected from wearable)
- Afternoon (default: 2 PM)
- Evening (default: 8 PM)
Not mandatory: Users can skip or dismiss. No guilt mechanics.
Adaptive timing: If wearable detects wake time, adjust morning prompt accordingly

Data Model

CREATE TABLE subjective_checkins (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  user_id UUID NOT NULL REFERENCES auth.users(id) ON DELETE CASCADE,
  checkin_time TIMESTAMPTZ NOT NULL,
  time_of_day TEXT NOT NULL CHECK (time_of_day IN ('morning', 'afternoon', 'evening')),
  energy INTEGER NOT NULL CHECK (energy BETWEEN 1 AND 5),
  mood INTEGER NOT NULL CHECK (mood BETWEEN 1 AND 5),
  focus INTEGER NOT NULL CHECK (focus BETWEEN 1 AND 5),
  physical_comfort INTEGER NOT NULL CHECK (physical_comfort BETWEEN 1 AND 5),
  created_at TIMESTAMPTZ DEFAULT now()
);

CREATE INDEX idx_subjective_user_time ON subjective_checkins(user_id, checkin_time DESC);

Integration with Experiments

Subjective data becomes a metric in experiment analysis:

New metric registry entries:
- avg_energy — Average daily energy score (source: subjective_checkins)
- avg_mood — Average daily mood score
- avg_focus — Average daily focus score
- avg_comfort — Average daily physical comfort score

Experiment results include subjective data:

Post-Meal Walk Experiment — 14 days

Objective Metrics:
RHR: -4 bpm (62→58)    -6.5%
Deep Sleep: +18 min      +34.6%

Subjective Metrics:
Afternoon Energy: +0.8   (3.2→4.0)
Evening Mood: +0.5       (3.5→4.0)

Pattern detection uses subjective data:
- "Your energy is 35% higher on days following 7+ hours of sleep"
- "Your focus score drops 0.8 points on days after alcohol consumption"
- These become unenrolled discoveries
Subjective data resolves "objective ambiguity": Sometimes wearable metrics show modest change but subjective improvement is dramatic. The discovery can note: "While your HRV showed a modest 5% improvement, your self-reported energy increased 40% during this experiment."

Privacy Consideration

Subjective data is deeply personal
Included in data export (GDPR)
Excluded from community aggregation by default (user must explicitly opt in)
Never shared in share cards

v2.6: Additional Features

ABAB Experiment Design — Advanced mode for power users to run alternating phases (A=normal, B=intervention, A=normal, B=intervention) for stronger evidence
Community Experiment Cohorts — Users running the same experiment see anonymized group progress
Counterfactual Estimation — "What would have happened without this experiment?" using baseline trend projection
Custom Experiment Creation — Allow users to design their own experiments
ML-Based Recommendation — Replace heuristic recommender with collaborative filtering as dataset grows
Data Export (GDPR) — Download all personal data in machine-readable format
Differential Privacy — Add noise to small-cohort aggregations to prevent re-identification

FilesExpand file tree

experiment-pivot-plan.md

Latest commit

History

experiment-pivot-plan.md

File metadata and controls

Experiment-Centric App Pivot — Implementation Plan

Executive Summary

What We Keep

What We Add

What We Retire (Hide, Not Remove)

Phase 1: Foundation (Database + AI + Metrics)

1.1 Wellness Terminology Audit (Sprint 1 — Do First)

Banned Words Dictionary

Required Phrases

Audit Process

1.2 New Database Tables

experiment_catalog — The Library

experiment_outcomes — Normalized Outcome Records (Strategic Asset)

community_experiment_stats — Wisdom of the Lab

user_discoveries — Discovery Output

user_playbook — Your Body's Operating Manual

device_data_quality — Wearable Sync Health Monitoring

1.3 Modify Existing Tables

experiments — Add Catalog Reference + Data Quality + Attribution

experiment_checkins — Add Confounder Tracking + Auto-Detection

experiment_results — Add Magnitude Scoring

1.4 RLS Policies

1.5 Seed Data — Experiment Catalog

v1: 4-8 High-Impact Experiments Only

1.6 AI Model Abstraction Layer

Edge Function: ai-engine

1.7 Metric Normalization — Extend Existing Registry

1.8 Baseline Auto-Computation

1.9 Data Quality Monitoring

Sync Health Assessment (runs after each sync cycle)

Impact on Experiments

Phase 2: Core Experiment Experience (UI + Flow)

2.1 Navigation Restructure

2.2 Personalized Starter Pack & Metric Gap Analysis

New User Experience

Metric Gap Analysis — "Where You Stand"

Starter Pack Selection

2.3 Discover Tab — Insight-First Design

Main Screen (discover.tsx)

Experiment Card Component

Experiment Detail Screen (experiment/[slug].tsx)

2.4 Enrollment Flow

2.5 My Lab Tab

Main Screen (lab.tsx)

Daily Check-in Flow — Adaptive by Adherence Type

Active Experiment Detail (experiment/active/[id].tsx)

2.6 Experiment Completion & Discovery

Null Results — "The Success of Elimination"

2.7 Playbook Tab — Progression System

Main Screen (playbook.tsx)

Progression Logic

Phase 3: Intelligence Layer (AI-Powered Features)

3.1 Unenrolled Discovery Engine (Pattern Spotting)

How It Works

Anti-Spam Rules

3.2 Experiment Recommender

3.3 Data Learning Pipeline

Pipeline Steps

Step Details

Bootstrap Strategy (Pre-Community Data)

3.4 Community Data Pipeline

3.5 Mid-Experiment Teasers

Phase 4: Migration & Polish

4.1 Navigation Migration

4.2 Existing Experiment Migration

4.3 Update analyze-experiment Edge Function

4.4 Apple Health + Google Fit — Deferred to v2

Phase 5: Privacy, Consent & Data Governance

5.1 Community Data Opt-Out

5.2 Research Usage Consent

5.3 Data Retention Policy

5.4 Anonymization Policy

5.5 User-Facing Privacy Documentation

Technical Architecture Decisions

`experiment_catalog` — The Library

`experiment_outcomes` — Normalized Outcome Records (Strategic Asset)

`community_experiment_stats` — Wisdom of the Lab

`user_discoveries` — Discovery Output

`user_playbook` — Your Body's Operating Manual

`device_data_quality` — Wearable Sync Health Monitoring

`experiments` — Add Catalog Reference + Data Quality + Attribution

`experiment_checkins` — Add Confounder Tracking + Auto-Detection

`experiment_results` — Add Magnitude Scoring

Edge Function: `ai-engine`

Main Screen (`discover.tsx`)

Experiment Detail Screen (`experiment/[slug].tsx`)

Main Screen (`lab.tsx`)

Active Experiment Detail (`experiment/active/[id].tsx`)

Main Screen (`playbook.tsx`)

4.3 Update `analyze-experiment` Edge Function