Skip to content

docs: Add ROADMAP for analytics engine#8

Merged
StuMason merged 3 commits intomainfrom
docs/roadmap-analytics-engine
Jan 13, 2026
Merged

docs: Add ROADMAP for analytics engine#8
StuMason merged 3 commits intomainfrom
docs/roadmap-analytics-engine

Conversation

@StuMason
Copy link
Copy Markdown
Owner

Comprehensive roadmap for transforming polar-flow-server into a health analytics engine for AI coaching applications.

Phases:

  • Phase 1: Derived Metrics Engine (baselines, rolling averages)
  • Phase 2: Pattern Detection (correlations, anomalies)
  • Phase 3: ML Models (optional - predictions, forecasting)
  • Phase 4: Insights API (unified endpoint for coaching layer)

Comprehensive roadmap for transforming polar-flow-server into a health
analytics engine for AI coaching applications.

Phases:
- Phase 1: Derived Metrics Engine (baselines, rolling averages)
- Phase 2: Pattern Detection (correlations, anomalies)
- Phase 3: ML Models (optional - predictions, forecasting)
- Phase 4: Insights API (unified endpoint for coaching layer)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@StuMason
Copy link
Copy Markdown
Owner Author

PR Review: Analytics Engine ROADMAP

This PR adds a comprehensive 867-line roadmap outlining the transformation of polar-flow-server from a data sync/storage layer into a health analytics engine.

Key Findings

1. Documentation Quality ✅

  • Excellent structure with clear phases and priorities
  • Comprehensive code examples for all proposed features
  • ASCII diagrams help visualize architecture

2. Technical Architecture - Phase 1 (Derived Metrics) ✅

  • UserBaseline model design follows existing patterns well
  • Baseline calculation logic is sound
  • Concern: Performance - 90-day queries could be expensive, consider adding indices and materialized views
  • Concern: Data quality - should filter outliers and handle missing data gaps
  • Concern: metric_name as string - consider using enum for consistency

3. Phase 2 (Pattern Detection) ⚠️

  • Correlation analysis approach is appropriate
  • Concern: Sample size of 14 may be too small for statistical significance
  • Concern: Z-score anomaly detection may not work well for non-normal distributions like HRV
  • Suggestion: Consider Spearman correlation and robust statistics (median, IQR)

4. Phase 3 (ML Models) 🚨 SECURITY ISSUE

  • CRITICAL: Storing pickled ML models in database is a security risk
  • joblib uses pickle protocol which can execute arbitrary code during deserialization
  • If attacker gains DB write access, they can inject malicious serialized objects
  • Recommendation: Store model parameters as JSON instead, or use ONNX format, or add HMAC signing
  • Other concerns: 30 days training data may cause overfitting, Prophet requires 10GB+ RAM

5. Phase 4 (Insights API) ✅

  • Natural language observations excellent for LLM consumption
  • Suggestions: Add API versioning from start, implement caching (5-15 min TTL), make thresholds configurable

6. Missing Considerations

  • Testing strategy for ML models and statistical calculations
  • GDPR/HIPAA considerations and right to deletion
  • Rate limiting for computationally expensive endpoints
  • Monitoring/observability metrics
  • Database migration strategy for backfilling baselines

Overall Assessment: ✅ APPROVE with Comments

This is an excellent roadmap with clear vision and sound technical approach. Since this is documentation-only, it's safe to merge.

MUST ADDRESS before Phase 3 implementation:

  • Model serialization security issue

Recommended before Phase 1:

  • Create GitHub issues for concerns raised
  • Add database schema migration plan
  • Write performance benchmarks

Great work! This transformation is exactly what the project needs. 🚀


Reviewed by: Claude Sonnet 4.5 (via GitHub Actions)
Files: 1 (ROADMAP.md - new file)
Lines: +867, -0

@StuMason
Copy link
Copy Markdown
Owner Author

Full Detailed Review (from CI)

1. Documentation Quality ✅

Strengths:

  • Excellent structure with clear phases and priorities
  • Comprehensive code examples for all proposed features
  • ASCII diagrams help visualize architecture
  • Implementation priorities clearly defined (Immediate → Long-term)
  • Success criteria and tech stack choices documented

Suggestions:

  • Consider adding estimated complexity/effort for each phase (e.g., Small/Medium/Large)
  • Add references to relevant academic papers or industry standards for metrics like training load ratios (e.g., the 1.3 acute:chronic threshold)

2. Technical Architecture 🔍

Phase 1 - Derived Metrics Engine:

Good:

  • UserBaseline model design is solid and follows existing patterns (UserScopedMixin)
  • Rolling averages (7d/30d/90d) are standard practice for health analytics
  • Baseline calculation logic is reasonable (line 142-180)

Concerns:

  1. Performance consideration (line 125-180): Calculating baselines for all metrics could be expensive. Consider:

    • Adding database indices on (user_id, date) for all source tables
    • Using materialized views or incremental calculations rather than recalculating from scratch
    • Document expected query performance for 90-day lookbacks
  2. Data quality (line 159): Check for len(data) < 7 but should also validate data quality:

    • Filter outliers (z-score > 3) before calculating baselines
    • Handle missing data gaps (e.g., user didnt wear device for a week)
    • Consider weighted averages for more recent data
  3. Schema design (line 91-132): The metric_name as a string is flexible but:

    • Consider an enum to prevent typos and ensure consistency
    • Add validation to prevent invalid metric names
    • Document the canonical list of metric names

Phase 2 - Pattern Detection:

Good:

  • Correlation analysis using scipy.stats.pearsonr is appropriate
  • Overtraining risk scoring with multiple factors is solid approach

Concerns:

  1. Statistical validity (line 383-404):

    • Minimum sample size of 14 for correlation may be too small for significance
    • Consider using Spearman correlation for non-linear relationships
    • Multiple comparison correction (Bonferroni) if testing many correlations simultaneously
  2. Cross-correlation lag detection (line 362): This is mentioned in the table but not implemented. Training-recovery lag is valuable but complex:

    • Consider adding example implementation
    • Document expected lag windows (typically 1-3 days for recovery)
  3. Anomaly detection (line 521-551): Z-score based anomalies work but:

    • 2 standard deviations may be too sensitive (consider 2.5 or 3)
    • May not work well for non-normal distributions (HRV is often right-skewed)
    • Consider using robust statistics (median, IQR) instead

3. Phase 3 - ML Models 🚨

SECURITY CONCERN ⚠️ (line 665-690):

Storing pickled ML models in the database is a security risk. The roadmap mentions using joblib and notes its "safe for sklearn models" (line 698), but:

Problem:

  • joblib uses Pythons pickle protocol which can execute arbitrary code during deserialization
  • If an attacker gains database write access (SQL injection, compromised admin), they could inject malicious serialized objects
  • Loading the model with joblib.load() would execute the malicious code

Recommendation:

  1. Better approach: Store model parameters/weights in JSON format, not serialized objects
  2. If you must use serialization:
    • Never deserialize models from untrusted sources
    • Add cryptographic signing (HMAC) to verify model integrity before loading
    • Isolate model loading in a sandboxed environment
  3. Add security warning in documentation about database access control
  4. Consider using ONNX format which is safer and framework-agnostic

Other ML Concerns:

  1. Overfitting (line 591-642): Training on only 30 days of data for personalized models is risky - consider requiring 60-90 days minimum
  2. Model retraining (line 742): Not addressed - need strategy for when/how to retrain
  3. Dependencies (line 704-712): prophet is heavyweight and requires 10GB+ RAM during training

4. Missing Considerations 📝

  1. Testing strategy: No mention of how to test ML models, statistical calculations, or integration tests
  2. Data privacy: GDPR/HIPAA considerations for pattern storage and right to deletion
  3. Rate limiting: New endpoints will be computationally expensive - need separate rate limits
  4. Monitoring: Add Prometheus/OpenTelemetry metrics for baseline calculation latency, ML prediction latency, etc.
  5. Documentation for coach layer: How should Laravel SaaS consume insights? Webhooks? Streaming API?
  6. Database migrations: Backfilling baselines for existing users could take hours - need batch processing strategy

5. Specific Line-by-Line Issues

Line 142-180 (BaselineService.calculate_hrv_baseline):
Missing: Handle timezone issues - date.today() uses server timezone. Use UTC consistently.

Line 410-425 (Overtraining risk scoring):
Hardcoded weights (25 points each). Consider making this a weighted model based on research - HRV decline is strongest predictor and should weight higher.

Line 535-546 (Z-score anomaly detection):
Will trigger false positives for non-normal distributions. Add note about distributional assumptions.

Line 816-890 (ObservationGenerator):
Observation text could be more specific - consider adding context like "for X consecutive days" or "a Y% decline from last week"

Line 919-925 (Success Criteria table):
Baseline calculation latency < 500ms seems aggressive for 90-day queries across multiple metrics. Test with production data volumes.


6. Recommendations

Before implementing Phase 1:

  • Create GitHub issues for each concern raised above
  • Add database schema migration plan
  • Write performance benchmarks for baseline calculations

Before implementing Phase 3:

  • Resolve model serialization security issue
  • Document model lifecycle (training, versioning, deprecation)
  • Add model monitoring/alerting

Reviewed by: Claude Sonnet 4.5 (GitHub Actions CI)

StuMason and others added 2 commits January 13, 2026 05:00
Key changes based on CI review:

Security (Critical):
- Replace pickle/joblib with JSON params or ONNX for ML model storage
- Add whitelist of allowed model classes
- Document security rationale

Statistical correctness:
- Use Spearman correlation instead of Pearson (robust to non-normal)
- Replace Z-score anomaly detection with IQR method (HRV is right-skewed)
- Increase minimum sample size from 14 to 21 for correlation
- Increase minimum training data from 30 to 60 days for ML

Performance:
- Add required database indices for (user_id, date) lookups
- Document incremental calculation strategy
- Fix timezone handling (use UTC, not date.today())

New sections:
- Testing Strategy with unit/integration test examples
- Data Privacy & Compliance (GDPR, right to deletion)
- Minimum data requirements table for ML models

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Data Readiness Convention:
- Feature unlock timeline (7/14/21/30/60/90 days)
- Consistent API response structure with status, feature_availability
- unlock_progress for gamification ("2 more days until patterns!")
- Coach integration notes for adjusting language based on data age
- New /users/{id}/status endpoint spec

Implementation Plan:
- Sprint 1: Foundation (baselines, status endpoint)
- Sprint 2: Patterns (correlations, anomalies)
- Sprint 3: Insights API (aggregation, observations)
- Sprint 4: ML (optional predictions)
- Clear task dependencies for each sprint

Test Data Seeding:
- Realistic data generators with weekly patterns
- generate_realistic_hrv_data() with Monday dips, gradual trends
- generate_sleep_data() with weekend variations
- generate_overtraining_scenario() for pattern detection tests
- generate_anomaly_scenario() for IQR edge cases
- Pytest fixtures for 7/14/30/60/90 day scenarios
- Data age scenario test matrix

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@StuMason StuMason merged commit e9b0f8e into main Jan 13, 2026
4 of 5 checks passed
@StuMason StuMason deleted the docs/roadmap-analytics-engine branch January 13, 2026 05:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant