This document centralizes all knowledge about the project for developers, operators, and AI agents. It replaces scattered docs; treat this file as the single source of truth.
Training Personal Data collects and analyzes personal health/fitness data, with first-class support for Oura Ring. It stores normalized daily metrics, produces Weekly Health Insights, and persists AI-generated analysis in a structured format for future querying and dashboards.
- Language/runtime: Clojure (Babashka)
- Database: PostgreSQL (Supabase-compatible)
- Sources: Oura Ring REST APIs (daily activity, sleep, readiness, heart rate, workouts, tags); Wahoo (workouts)
- Storage: Normalized tables (+ raw JSON)
- Insights: Weekly aggregation + GPT analysis persisted to DB
- Automation: GitHub Actions for periodic sync and weekly insights
Oura API → ETL (bb tasks) → Postgres (normalized + raw JSON)
→ Weekly insights (SQL + GPT) → ouraring_weekly_insights
flowchart LR
subgraph Oura Cloud
OURA[Oura REST APIs]
end
subgraph ETL
BB[bb tasks (Clojure/Babashka)]
NORM[Normalize records]
end
subgraph DB[(PostgreSQL)]
RAW[Raw JSON tables]
NORMTBL[Normalized daily tables]
WEEK[ouraring_weekly_insights]
end
OURA --> BB --> NORM --> DB
DB <--> WEEK
flowchart LR
CFG[Endpoint Config (data)] --> PIPE[Generic Pipeline]
PIPE -->|fetch-fn| FETCH[Fetch endpoint data]
PIPE -->|normalize-fn| TRANSFORM[Normalize records]
PIPE -->|ensure-table| SCHEMA[Create-if-not-exists]
PIPE -->|extract-values-fn| SAVE[Batch save]
SAVE --> DB[(PostgreSQL)]
flowchart LR
RANGE[Date inside week] --> WEEKCALC[Compute Mon→Sun]
WEEKCALC --> Q1[SQL: Sleep (avg, arrays)]
WEEKCALC --> Q2[SQL: Readiness avg]
WEEKCALC --> Q3[SQL: Activity avg]
Q1 --> FORMAT[Format payload (units + availability)]
Q2 --> FORMAT
Q3 --> FORMAT
FORMAT --> GPT[GPT Analysis]
GPT --> PARSE[Parse text/table/cross-insight]
PARSE --> SAVE[Persist insight (dates, numbers, JSONB)]
SAVE --> WEEK[(ouraring_weekly_insights)]
Create a .env file at repository root:
OURA_TOKEN=your_oura_ring_api_token
WAHOO_TOKEN=your_wahoo_api_token
SUPABASE_HOST=your_database_host
SUPABASE_PORT=5432
SUPABASE_USER=postgres
SUPABASE_PASSWORD=your_database_password
SUPABASE_DB_NAME=your_database_name
OPENAI_API_KEY=your_openai_api_keyWahoo OAuth auto-refresh (optional but recommended for CLI tasks):
# If set, the runner will refresh an access token at startup using these:
WAHOO_CLIENT_ID=your_wahoo_client_id
WAHOO_CLIENT_SECRET=your_wahoo_client_secret
WAHOO_REFRESH_TOKEN=your_long_lived_refresh_token
# Refresh tokens rotate on each refresh. The sync stores the latest tokens in Postgres
# (`wahoo_oauth_tokens` table) so no manual secret updates are required after bootstrap.Configuration loader: src/training_personal_data/config.clj (reads env vars, validates presence).
- Tokens are cached in Postgres (
wahoo_oauth_tokens, columns:id,provider,access_token,refresh_token,expires_at_epoch,raw_json). - Sync order: use the persisted row first (logs show
:token_source :stored-cache), then fall back toWAHOO_REFRESH_TOKEN, thenWAHOO_AUTH_CODE, and finally a one-offWAHOO_TOKEN. - First-time/repair flow: set
WAHOO_CLIENT_ID,WAHOO_CLIENT_SECRET,WAHOO_REDIRECT_URI, and a fresh single-useWAHOO_AUTH_CODE, runbb run:wahoo ..., and the pipeline will exchange and persist the rotating tokens. RemoveWAHOO_AUTH_CODEafterwards. - Routine runs (local or cron) only require the client id/secret (+ redirect URI). You may unset
WAHOO_REFRESH_TOKEN; the job will refresh from the database on each execution. - To reset, delete the
defaultrow and bootstrap again. - Observability: look for
:token_sourcein the logs to confirm the credential path;:wahoo-token-saveevents include the computed expiry epoch.
- Workout ingestion retries transient Supabase errors (
connection attempt failed,connection reset, etc.) three times with exponential backoff; failures emit:db-transient-errorlogs before a retry. - All DB responses are normalised before persistence, ensuring JSON
raw_jsoncolumns stay consistent across refresh cycles.
- Sync Oura Ring data for a range:
bb run:oura "YYYY-MM-DD" "YYYY-MM-DD"
- Sync Wahoo workouts for a date range (prefers persisted OAuth tokens). If you provide
WAHOO_REFRESH_TOKEN,WAHOO_CLIENT_ID, andWAHOO_CLIENT_SECRET, the runner refreshes the token automatically and persists the rotated credentials to Postgres for subsequent runs.bb run:wahoo "YYYY-MM-DD" "YYYY-MM-DD"- First run without a refresh token: provide an authorization code to bootstrap
and persist the refresh token
- Required envs for bootstrap:
WAHOO_CLIENT_ID,WAHOO_CLIENT_SECRET,WAHOO_AUTH_CODE,WAHOO_REDIRECT_URI
- Required envs for bootstrap:
- Generate weekly insights (for a date inside the target week):
bb -m training-personal-data.insights.week YYYY-MM-DD
- Run tests:
bb test
Created automatically on demand. Core tables:
ouraring_daily_activityouraring_daily_sleepouraring_daily_readinessouraring_heart_rateouraring_workoutouraring_tagswahoo_workout
Weekly insights output table:
ouraring_weekly_insightswith columns (subset shown):id text(e.g.,week_2024-12-30)week_start date,week_end date,week_range textavg_sleep_score double precisionavg_sleep_duration double precision(hours)avg_sleep_quality double precision(percentage)avg_readiness_score double precisionavg_active_calories double precisionavg_activity_score double precisiongpt_analysis textgpt_metrics_table jsonb(structured table of metrics)gpt_cross_data_insight textraw_data jsonb(raw weekly aggregates for traceability)
Wahoo tables (subset):
wahoo_workoutwith columns (subset shown):id text(primary key)starts timestamp,created_at timestamp,updated_at timestampminutes integer,name text,plan_id text NULL,workout_token text NULL,workout_type_id integerworkout_summary jsonb(structured per workout when available)raw_json jsonb(raw workout record)
Namespace: src/training_personal_data/insights/week.clj
- Week range: Monday → Sunday. The date passed to
-mainis any day within the target week; the system computes week_start (Monday) and week_end (Sunday). - Sleep duration unit: Oura stores
total_sleepin minutes (in our dataset). We- Exclude zeros from the average:
AVG(CASE WHEN total_sleep > 0 THEN total_sleep ELSE NULL END) - Convert minutes → hours by dividing by 60:
... / 60 as avg_sleep_duration
- Exclude zeros from the average:
- Readiness & Activity: average scores/calories computed for the week; may be missing if no records.
- Types when saving weekly insight:
week_startandweek_endarejava.sql.Date- JSON fields are stored as JSONB with explicit
?::jsonbcasts in SQL
- English-only outputs (MDC): all logs, prompts, and terminal messages are in English.
Observability (sample events):
:date-range-calculation,:query-sleep-data-range,:individual-sleep-records,:sleep-data-retrieved,:readiness-data-retrieved,:activity-data-retrieved,:sleep-duration-analysis,:formatted-data-for-gpt,:db-save-insight,:complete.
Namespaces:
training-personal-data.insights.prompt: builds the prompt and calls the GPT API.training-personal-data.insights.week: constructs the input payload and saves outputs.
Prompt content:
- Explicit units (scores 0–100, sleep duration in hours, sleep quality %).
- Data availability flags (available/missing) to avoid hallucinations.
- Response structure requested:
- A concise summary of the week
- A metrics table: Metric | Value | Interpretation | Recommendation
- A cross-data insight paragraph
Persistence:
- Raw GPT text →
gpt_analysis - Parsed metrics table (when available) →
gpt_metrics_table(JSONB) - Cross-metric narrative →
gpt_cross_data_insight - Fallback behavior when GPT errors occur: save available data, log errors.
cron.yml: syncs Oura data periodically (every 3 hours), supports manual dispatch.weekly-insights.yml: runs every Monday at 08:00 UTC; computes previous Sunday and runs weekly insights. Requires secrets:OPENAI_API_KEY,SUPABASE_DB_NAME,SUPABASE_HOST,SUPABASE_USER,SUPABASE_PASSWORD
- Execute all tests:
bb test - Coverage includes:
- Date range computation (Mon→Sun)
safe-doubleconversion (numbers, numeric strings, nil/empty/invalid → nil; zero → nil)- Formatting for GPT (units and availability flags)
- All user-facing text must be in English (logs, CLI output, prompts, comments intended for readers).
- Logging: structured maps with
:eventplus salient fields. - SQL: cast parameter types explicitly when comparing with
timestamp(e.g.,?::timestamp). - Persist JSON as JSONB with explicit
?::jsonbcasts. - DB layer enforces casts automatically for common columns:
- Timestamps:
starts,created_at,updated_at→?::timestamp - JSON:
raw_json,workout_summary,raw_data,gpt_metrics_table, and any*_json→?::jsonb
- Timestamps:
- Error:
column "week_start" is of type date but expression is of type character varying- Cause: inserting string instead of SQL date
- Fix: convert with
java.sql.Date/valueOfbefore saving
- Error:
column "raw_data"/"gpt_metrics_table" is of type jsonb but expression is of type character varying- Cause: string not cast to JSONB
- Fix: use
?::jsonbin SQL and provide valid JSON string
- Error:
column "workout_summary" is of type jsonb but expression is of type character varying- Cause: providing a JSON string without JSONB cast
- Fix: the DB layer already casts this column to
?::jsonb; ensure the value is a valid JSON string
- Error:
operator does not exist: timestamp without time zone >= character varying- Cause: comparing timestamp column with string parameter
- Fix: cast parameters:
WHERE timestamp >= ?::timestamp AND timestamp <= ?::timestamp
- Error:
column "starts" is of type timestamp without time zone but expression is of type character varying- Cause: inserting ISO8601 string without cast
- Fix: the DB layer now casts
starts/created_at/updated_atto?::timestamp
- Sleep duration shows ~0 hours
- Causes: zeros included in average; minutes not converted to hours
- Fix:
AVG(CASE WHEN total_sleep > 0 THEN total_sleep ELSE NULL END)/60
- Branch:
feature/<short-name> - Run tests locally:
bb test - Ensure English-only text and structured logging
- Open PR with clear title/summary (see example in previous PR templates)
- Add monthly insights and trend analysis
- Build lightweight dashboard (charts of scores/duration)
- Extend parsing of GPT outputs into more granular JSON schemas
- Add alerts (e.g., if sleep duration below threshold)
This AGENT guide is authoritative. If you change behavior, update this file in the same PR.
The Oura ingestion moved from near-duplicate endpoint code to a generic, data-driven pipeline. This removes duplication, centralizes behavior, and simplifies tests.
- Eliminate copy/paste across endpoints (activity, sleep, readiness, tags, workouts, heart-rate)
- Single point for error handling, logging, batching, and persistence
- Configuration-as-data to add/modify endpoints without new code
Endpoint Config (data) → Generic Pipeline → DB
-
Endpoint Config fields (example):
- name: "activity"
- table-name, columns, schema
- fetch-fn (calls Oura API)
- normalize-fn (map API record → normalized map)
- extract-values-fn (normalized map → DB values vector)
-
Generic pipeline stages:
- fetch (with token and date range)
- transform each record (normalize)
- ensure table exists (create-if-not-exists)
- save records (batch-friendly, consistent logging)
- Maintenance: changes to flow live in one place
- Testability: test pipeline once; endpoints test only config/transform
- Performance: supports batching/parallelization; connection reuse-friendly
- Consistency: uniform logging/events across endpoints
(def activity-config
{:name "activity"
:table-name activity-db/table-name
:columns activity-db/columns
:schema activity-db/schema
:fetch-fn activity-api/fetch
:normalize-fn activity-api/normalize
:extract-values-fn activity-db/extract-values})
(pipeline/execute-pipeline activity-config token start-date end-date db-spec)- Generic pipeline available and integrated
- Legacy per-endpoint cores can be deleted once configs and tests are fully aligned
:endpoint-syncevents: start/process/complete- Counts of processed records
- Structured error events with endpoint identifiers