Skip to content

Feature/analytics engine#9

Merged
StuMason merged 6 commits intomainfrom
feature/analytics-engine
Jan 13, 2026
Merged

Feature/analytics engine#9
StuMason merged 6 commits intomainfrom
feature/analytics-engine

Conversation

@StuMason
Copy link
Copy Markdown
Owner

No description provided.

StuMason and others added 3 commits January 13, 2026 05:24
- Add UserBaseline model with IQR-based anomaly detection
- Add Alembic migration for user_baselines table and analytics indices
- Implement BaselineService with calculations for:
  - HRV RMSSD (from nightly recharge)
  - Sleep score
  - Resting heart rate
  - Training load and ratio
- Add baselines API endpoints:
  - GET /users/{id}/baselines - List all baselines
  - GET /users/{id}/baselines/{metric} - Get specific baseline
  - POST /users/{id}/baselines/calculate - Trigger calculation
  - GET /users/{id}/baselines/check/{metric}/{value} - Anomaly check
  - GET /users/{id}/analytics/status - Data readiness and feature unlocks

Features available based on data days:
- 7 days: Basic stats
- 14 days: Trend analysis
- 21 days: Personalized baselines
- 30 days: Predictive models
- 60 days: Advanced ML
- 90 days: Long-term patterns

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…1.9)

- Add automatic baseline recalculation after sync operations
- Create test data seeding fixtures with 90 days of realistic data
  - Weekly HRV patterns (Monday dips, Friday peaks)
  - Sleep score variations
  - Training load periodization (4-week cycles)
  - Occasional anomalies (2% chance)
- Add analytics fixtures for different data scenarios (90d, 21d, 7d, 3d)
- Write 13 comprehensive tests covering:
  - Baseline status levels (ready/partial/insufficient)
  - IQR calculations and bounds
  - Anomaly detection (warning/critical thresholds)
  - Multi-metric baseline calculation

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@claude
Copy link
Copy Markdown

claude bot commented Jan 13, 2026

Test comment from review

- Add type params to ASGIConnection[Any, Any, Any, Any]
- Add type params to Response[Any]
- Add type params to Request[Any, Any, Any]
- Import ASGIApp from litestar.types
- Add type: ignore for scope comparison (litestar typing issue)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@StuMason
Copy link
Copy Markdown
Owner Author

PR Review: Feature/analytics engine

Executive Summary

This PR adds a comprehensive analytics baseline system for health metrics. Overall implementation is solid and well-architected, with proper multi-tenancy, good statistical methods, and security-conscious design. Found several issues that should be addressed before merge.


🔴 Critical Issues

1. Authorization Bypass: Missing user_id Validation

Location: src/polar_flow_server/core/auth.py:265-271

Issue: In per_user_api_key_guard, user-scoped API keys are validated ONLY when api_key.user_id is not None. However, if an attacker uses a service-level API key (where user_id=None), they can access ANY user's data by changing the path parameter.

Current logic:

if api_key.user_id is not None and path_user_id:
    if api_key.user_id != path_user_id:
        raise NotAuthorizedException(...)

Problem: When api_key.user_id is None (service-level key), the check is skipped entirely.

Impact: Anyone with a service-level API key can access all users' baseline data.

Recommendation: Add explicit service-level key validation or require admin privileges for cross-user access.

Severity: 🔴 HIGH - Authorization bypass in multi-tenant system


2. Type Ignore Comments Hide Real Issue

Location: src/polar_flow_server/models/baseline.py:182-183

extreme_lower = self.q1 - 3 * self.iqr  # type: ignore
extreme_upper = self.q3 + 3 * self.iqr  # type: ignore

Issue: The type ignores are suppressing legitimate type errors. self.q1 and self.q3 can be None, but the check on line 181 only validates self.iqr.

Correct fix:

if self.iqr is not None and self.q1 is not None and self.q3 is not None:
    extreme_lower = self.q1 - 3 * self.iqr
    extreme_upper = self.q3 + 3 * self.iqr

Severity: 🟠 Medium (could cause runtime errors if q1/q3 are None)


⚠️ Security Concerns

3. SQL Injection Risk via User-Controlled user_id

Location: All API endpoints in src/polar_flow_server/api/baselines.py

Issue: The user_id path parameter is passed directly to SQLAlchemy queries without validation. While SQLAlchemy's parameterization provides protection, there's no validation that user_id is well-formed.

Recommendation: Add validation or use Litestar's path parameter constraints.

Severity: Medium-High (mitigated by SQLAlchemy but defense-in-depth is important)


4. Race Condition in Baseline Calculation

Location: src/polar_flow_server/services/baseline.py:71

Issue: calculate_all_baselines() commits once at the end, but concurrent requests for the same user could cause lost updates or database deadlocks.

Recommendation: Add user-level locking using PostgreSQL advisory locks.

Severity: Medium


5. No Input Validation on Float Values

Location: src/polar_flow_server/api/baselines.py:149-155

The check_anomaly endpoint accepts value: float without bounds checking. Could pass Infinity, -Infinity, or NaN.

Recommendation: Add validation using math.isfinite(value).

Severity: Low-Medium


🐛 Bugs & Logic Issues

6. Incorrect Quantile Handling

Location: src/polar_flow_server/services/baseline.py:309-312

if sample_count >= 4:
    qs = quantiles(values, n=4)
    q1 = qs[0]
    q3 = qs[2] if len(qs) > 2 else None

Issue: The if len(qs) > 2 check is redundant—quantiles(n=4) always returns exactly 3 values.

Recommendation: Simplify to q1, q3 = qs[0], qs[2]

Severity: Low (no functional impact)


✅ What's Done Well

  1. Excellent multi-tenancy: UserScopedMixin and user_id on all queries
  2. Robust statistics: IQR-based anomaly detection is appropriate for skewed health metrics
  3. Good migration: Proper indices, composite keys, and comments
  4. Idempotent upserts: ON CONFLICT DO UPDATE prevents duplicates
  5. Rate limiting: Proper implementation in auth guard
  6. Clear API documentation: Well-documented endpoints

📋 Recommendations Summary

Must Fix Before Merge:

  1. ✅ Fix authorization bypass for service-level API keys (Add Claude Code GitHub Workflow #1)
  2. ✅ Fix type ignore issues in baseline.py is_anomaly method (feat: Complete V3 API Integration with HTMX Admin Panel #2)

Should Fix:

  1. Add user_id validation across all endpoints (feat: Complete Polar Experience - Dashboard, API, Biosensing & Analytics #3)
  2. Add race condition protection for concurrent calculations (ci(deps)(deps): Bump actions/checkout from 4 to 6 #4)
  3. Add input validation for float values (feat: API key authentication and Alembic migrations #5)

Nice to Have:

  1. Remove redundant quantile check (feat: Admin authentication and dynamic OAuth URLs #6)

Overall Assessment

Recommendation: Approve with required changes

The implementation demonstrates strong architectural design and appropriate statistical methods. The critical authorization issue (#1) and type safety issue (#2) should be addressed. Once fixed, this will be a solid foundation for the analytics engine.

Great work on the comprehensive baseline infrastructure! 🎉


Review generated by Claude Code Review workflow

StuMason and others added 2 commits January 13, 2026 05:55
- Enable use_sticky_comment for automatic PR comment posting
- Remove gh pr comment from allowed tools (was causing shell escaping issues)
- Add Write tool for flexibility
- Update prompt to output review directly instead of posting manually

The previous approach had Claude try to post via gh pr comment, but
markdown with code blocks caused shell escaping failures.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Fix type safety in is_anomaly(): Remove type: ignore comments by
  checking all required values (q1, q3, iqr) before calculations
- Add user_id format validation: Regex-based validation prevents
  injection attacks, limits to alphanumeric with _ and - only
- Add float input validation: Reject NaN, inf, -inf values in
  check_anomaly endpoint to prevent edge case errors
- Remove redundant quantile check: quantiles(n=4) always returns
  exactly [Q1, Q2, Q3], no length check needed
- Document service-level API key authorization model: Clarify that
  user_id=None keys intentionally have admin access for SaaS backends

Security improvements:
- ValidationException for invalid user_id format (1-100 chars)
- ValidationException for non-finite float values
- Don't reveal all valid metric names in error messages

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@StuMason StuMason merged commit 9016c3c into main Jan 13, 2026
4 of 5 checks passed
@StuMason StuMason deleted the feature/analytics-engine branch January 13, 2026 06:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant