docs: Add ROADMAP for analytics engine by StuMason · Pull Request #8 · StuMason/polar-flow-server

StuMason · 2026-01-13T04:35:20Z

Comprehensive roadmap for transforming polar-flow-server into a health analytics engine for AI coaching applications.

Phases:

Phase 1: Derived Metrics Engine (baselines, rolling averages)
Phase 2: Pattern Detection (correlations, anomalies)
Phase 3: ML Models (optional - predictions, forecasting)
Phase 4: Insights API (unified endpoint for coaching layer)

Comprehensive roadmap for transforming polar-flow-server into a health analytics engine for AI coaching applications. Phases: - Phase 1: Derived Metrics Engine (baselines, rolling averages) - Phase 2: Pattern Detection (correlations, anomalies) - Phase 3: ML Models (optional - predictions, forecasting) - Phase 4: Insights API (unified endpoint for coaching layer) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

StuMason · 2026-01-13T04:53:50Z

PR Review: Analytics Engine ROADMAP

This PR adds a comprehensive 867-line roadmap outlining the transformation of polar-flow-server from a data sync/storage layer into a health analytics engine.

Key Findings

1. Documentation Quality ✅

Excellent structure with clear phases and priorities
Comprehensive code examples for all proposed features
ASCII diagrams help visualize architecture

2. Technical Architecture - Phase 1 (Derived Metrics) ✅

UserBaseline model design follows existing patterns well
Baseline calculation logic is sound
Concern: Performance - 90-day queries could be expensive, consider adding indices and materialized views
Concern: Data quality - should filter outliers and handle missing data gaps
Concern: metric_name as string - consider using enum for consistency

3. Phase 2 (Pattern Detection) ⚠️

Correlation analysis approach is appropriate
Concern: Sample size of 14 may be too small for statistical significance
Concern: Z-score anomaly detection may not work well for non-normal distributions like HRV
Suggestion: Consider Spearman correlation and robust statistics (median, IQR)

4. Phase 3 (ML Models) 🚨 SECURITY ISSUE

CRITICAL: Storing pickled ML models in database is a security risk
joblib uses pickle protocol which can execute arbitrary code during deserialization
If attacker gains DB write access, they can inject malicious serialized objects
Recommendation: Store model parameters as JSON instead, or use ONNX format, or add HMAC signing
Other concerns: 30 days training data may cause overfitting, Prophet requires 10GB+ RAM

5. Phase 4 (Insights API) ✅

Natural language observations excellent for LLM consumption
Suggestions: Add API versioning from start, implement caching (5-15 min TTL), make thresholds configurable

6. Missing Considerations

Testing strategy for ML models and statistical calculations
GDPR/HIPAA considerations and right to deletion
Rate limiting for computationally expensive endpoints
Monitoring/observability metrics
Database migration strategy for backfilling baselines

Overall Assessment: ✅ APPROVE with Comments

This is an excellent roadmap with clear vision and sound technical approach. Since this is documentation-only, it's safe to merge.

MUST ADDRESS before Phase 3 implementation:

Model serialization security issue

Recommended before Phase 1:

Create GitHub issues for concerns raised
Add database schema migration plan
Write performance benchmarks

Great work! This transformation is exactly what the project needs. 🚀

Reviewed by: Claude Sonnet 4.5 (via GitHub Actions)
Files: 1 (ROADMAP.md - new file)
Lines: +867, -0

StuMason · 2026-01-13T04:55:14Z

Full Detailed Review (from CI)

1. Documentation Quality ✅

Strengths:

Excellent structure with clear phases and priorities
Comprehensive code examples for all proposed features
ASCII diagrams help visualize architecture
Implementation priorities clearly defined (Immediate → Long-term)
Success criteria and tech stack choices documented

Suggestions:

Consider adding estimated complexity/effort for each phase (e.g., Small/Medium/Large)
Add references to relevant academic papers or industry standards for metrics like training load ratios (e.g., the 1.3 acute:chronic threshold)

2. Technical Architecture 🔍

Phase 1 - Derived Metrics Engine:

Good:

UserBaseline model design is solid and follows existing patterns (UserScopedMixin)
Rolling averages (7d/30d/90d) are standard practice for health analytics
Baseline calculation logic is reasonable (line 142-180)

Concerns:

Performance consideration (line 125-180): Calculating baselines for all metrics could be expensive. Consider:
- Adding database indices on (user_id, date) for all source tables
- Using materialized views or incremental calculations rather than recalculating from scratch
- Document expected query performance for 90-day lookbacks
Data quality (line 159): Check for len(data) < 7 but should also validate data quality:
- Filter outliers (z-score > 3) before calculating baselines
- Handle missing data gaps (e.g., user didnt wear device for a week)
- Consider weighted averages for more recent data
Schema design (line 91-132): The metric_name as a string is flexible but:
- Consider an enum to prevent typos and ensure consistency
- Add validation to prevent invalid metric names
- Document the canonical list of metric names

Phase 2 - Pattern Detection:

Good:

Correlation analysis using scipy.stats.pearsonr is appropriate
Overtraining risk scoring with multiple factors is solid approach

Concerns:

Statistical validity (line 383-404):
- Minimum sample size of 14 for correlation may be too small for significance
- Consider using Spearman correlation for non-linear relationships
- Multiple comparison correction (Bonferroni) if testing many correlations simultaneously
Cross-correlation lag detection (line 362): This is mentioned in the table but not implemented. Training-recovery lag is valuable but complex:
- Consider adding example implementation
- Document expected lag windows (typically 1-3 days for recovery)
Anomaly detection (line 521-551): Z-score based anomalies work but:
- 2 standard deviations may be too sensitive (consider 2.5 or 3)
- May not work well for non-normal distributions (HRV is often right-skewed)
- Consider using robust statistics (median, IQR) instead

3. Phase 3 - ML Models 🚨

SECURITY CONCERN ⚠️ (line 665-690):

Storing pickled ML models in the database is a security risk. The roadmap mentions using joblib and notes its "safe for sklearn models" (line 698), but:

Problem:

joblib uses Pythons pickle protocol which can execute arbitrary code during deserialization
If an attacker gains database write access (SQL injection, compromised admin), they could inject malicious serialized objects
Loading the model with joblib.load() would execute the malicious code

Recommendation:

Better approach: Store model parameters/weights in JSON format, not serialized objects
If you must use serialization:
- Never deserialize models from untrusted sources
- Add cryptographic signing (HMAC) to verify model integrity before loading
- Isolate model loading in a sandboxed environment
Add security warning in documentation about database access control
Consider using ONNX format which is safer and framework-agnostic

Other ML Concerns:

Overfitting (line 591-642): Training on only 30 days of data for personalized models is risky - consider requiring 60-90 days minimum
Model retraining (line 742): Not addressed - need strategy for when/how to retrain
Dependencies (line 704-712): prophet is heavyweight and requires 10GB+ RAM during training

4. Missing Considerations 📝

Testing strategy: No mention of how to test ML models, statistical calculations, or integration tests
Data privacy: GDPR/HIPAA considerations for pattern storage and right to deletion
Rate limiting: New endpoints will be computationally expensive - need separate rate limits
Monitoring: Add Prometheus/OpenTelemetry metrics for baseline calculation latency, ML prediction latency, etc.
Documentation for coach layer: How should Laravel SaaS consume insights? Webhooks? Streaming API?
Database migrations: Backfilling baselines for existing users could take hours - need batch processing strategy

5. Specific Line-by-Line Issues

Line 142-180 (BaselineService.calculate_hrv_baseline):
Missing: Handle timezone issues - date.today() uses server timezone. Use UTC consistently.

Line 410-425 (Overtraining risk scoring):
Hardcoded weights (25 points each). Consider making this a weighted model based on research - HRV decline is strongest predictor and should weight higher.

Line 535-546 (Z-score anomaly detection):
Will trigger false positives for non-normal distributions. Add note about distributional assumptions.

Line 816-890 (ObservationGenerator):
Observation text could be more specific - consider adding context like "for X consecutive days" or "a Y% decline from last week"

Line 919-925 (Success Criteria table):
Baseline calculation latency < 500ms seems aggressive for 90-day queries across multiple metrics. Test with production data volumes.

6. Recommendations

Before implementing Phase 1:

Create GitHub issues for each concern raised above
Add database schema migration plan
Write performance benchmarks for baseline calculations

Before implementing Phase 3:

Resolve model serialization security issue
Document model lifecycle (training, versioning, deprecation)
Add model monitoring/alerting

Reviewed by: Claude Sonnet 4.5 (GitHub Actions CI)

Key changes based on CI review: Security (Critical): - Replace pickle/joblib with JSON params or ONNX for ML model storage - Add whitelist of allowed model classes - Document security rationale Statistical correctness: - Use Spearman correlation instead of Pearson (robust to non-normal) - Replace Z-score anomaly detection with IQR method (HRV is right-skewed) - Increase minimum sample size from 14 to 21 for correlation - Increase minimum training data from 30 to 60 days for ML Performance: - Add required database indices for (user_id, date) lookups - Document incremental calculation strategy - Fix timezone handling (use UTC, not date.today()) New sections: - Testing Strategy with unit/integration test examples - Data Privacy & Compliance (GDPR, right to deletion) - Minimum data requirements table for ML models Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Data Readiness Convention: - Feature unlock timeline (7/14/21/30/60/90 days) - Consistent API response structure with status, feature_availability - unlock_progress for gamification ("2 more days until patterns!") - Coach integration notes for adjusting language based on data age - New /users/{id}/status endpoint spec Implementation Plan: - Sprint 1: Foundation (baselines, status endpoint) - Sprint 2: Patterns (correlations, anomalies) - Sprint 3: Insights API (aggregation, observations) - Sprint 4: ML (optional predictions) - Clear task dependencies for each sprint Test Data Seeding: - Realistic data generators with weekly patterns - generate_realistic_hrv_data() with Monday dips, gradual trends - generate_sleep_data() with weekend variations - generate_overtraining_scenario() for pattern detection tests - generate_anomaly_scenario() for IQR edge cases - Pytest fixtures for 7/14/30/60/90 day scenarios - Data age scenario test matrix Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

StuMason and others added 2 commits January 13, 2026 05:00

StuMason merged commit e9b0f8e into main Jan 13, 2026
4 of 5 checks passed

StuMason deleted the docs/roadmap-analytics-engine branch January 13, 2026 05:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: Add ROADMAP for analytics engine#8

docs: Add ROADMAP for analytics engine#8
StuMason merged 3 commits intomainfrom
docs/roadmap-analytics-engine

StuMason commented Jan 13, 2026

Uh oh!

StuMason commented Jan 13, 2026

Uh oh!

StuMason commented Jan 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

StuMason commented Jan 13, 2026

Uh oh!

StuMason commented Jan 13, 2026

PR Review: Analytics Engine ROADMAP

Key Findings

Overall Assessment: ✅ APPROVE with Comments

Uh oh!

StuMason commented Jan 13, 2026

Full Detailed Review (from CI)

1. Documentation Quality ✅

2. Technical Architecture 🔍

3. Phase 3 - ML Models 🚨

4. Missing Considerations 📝

5. Specific Line-by-Line Issues

6. Recommendations

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant