feat: Pattern Detection & Anomaly Scanning (Sprint 2) by StuMason · Pull Request #10 · StuMason/polar-flow-server

StuMason · 2026-01-13T06:29:37Z

Summary

Add pattern detection service with Spearman correlation analysis
Implement overtraining risk scoring (multi-metric composite 0-100)
Add trend analysis (7-day vs 30-day baseline comparison)
Create bulk anomaly scanning across all metrics
Add comprehensive API documentation for patterns

New Endpoints

Method	Endpoint	Description
GET	`/users/{id}/patterns`	List all detected patterns
GET	`/users/{id}/patterns/{name}`	Get specific pattern
POST	`/users/{id}/patterns/detect`	Trigger pattern detection
GET	`/users/{id}/anomalies`	Bulk scan for anomalies

Pattern Types

sleep_hrv_correlation - Spearman correlation with p-value confidence
overtraining_risk - Multi-metric composite score with recovery recommendations
hrv_trend - 7-day vs 30-day HRV baseline comparison
sleep_trend - 7-day vs 30-day sleep score comparison

Test plan

13 new tests for pattern detection (all passing)
58 total tests passing
mypy strict mode passes
ruff linting passes
Manual API testing with real data

🤖 Generated with Claude Code

- Document all 5 baselines endpoints with examples - Explain IQR-based anomaly detection (warning/critical) - Document feature unlock timeline (7/14/21/30/60/90 days) - Update index.md with analytics overview and examples Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Add pattern detection service with Spearman correlation, overtraining risk scoring, and trend analysis. Includes bulk anomaly scanning across all metrics. New endpoints: - GET /users/{id}/patterns - List all detected patterns - GET /users/{id}/patterns/{name} - Get specific pattern - POST /users/{id}/patterns/detect - Trigger pattern detection - GET /users/{id}/anomalies - Bulk scan for anomalies Pattern types: - sleep_hrv_correlation: Spearman correlation with p-value confidence - overtraining_risk: Multi-metric composite score (0-100) - hrv_trend: 7-day vs 30-day baseline comparison - sleep_trend: 7-day vs 30-day baseline comparison Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Fixes from code review: - Fix baseline calculation to exclude recent 7 days from baseline (was including recent data which diluted trend sensitivity) - Add NaN validation for spearmanr results (handles constant arrays) - Use explicit lambda for metrics_involved default in model - Fix type safety in interpretation property (check isinstance) - Pin scipy version to >=1.14.0,<2.0.0 - Add comment clarifying migration ID generation approach Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

StuMason · 2026-01-13T06:41:35Z

@claude please re-review

claude · 2026-01-13T06:41:50Z

Claude finished @StuMason's task in 1m 35s —— View job

Re-reviewing PR #10

Tasks

Code Review Summary

Great work on the pattern detection implementation! The code is well-structured, thoroughly tested, and follows the project's architectural patterns. I found several issues that should be addressed before merging.

Critical Issues

1. SQL Injection Risk in Migration (src/polar_flow_server/services/pattern.py:542-548)

The upsert uses string-based constraint name which could be vulnerable. While the current implementation is safe, consider using the ORM constraint object for better type safety:

stmt = stmt.on_conflict_do_update(
    index_elements=['user_id', 'pattern_type', 'pattern_name'],
    set_=pattern_data,
)

This is more explicit and prevents potential issues if the constraint name changes.

2. Missing Session Rollback on Error (src/polar_flow_server/services/pattern.py:64-99)

The detect_all_patterns method commits at the end but doesn't handle rollback on exceptions. If one pattern detection fails, you'll have partial data:

try:
    # ... pattern detection ...
    await self.session.commit()
except Exception as e:
    await self.session.rollback()
    self.logger.error("Pattern detection failed", error=str(e))
    raise

Fix this →

High Priority

3. Hardcoded Confidence Value (src/polar_flow_server/services/pattern.py:393)

The trend confidence is hardcoded at 0.8. This should be calculated based on sample size or data quality:

# Current:
confidence=0.8,  # Trend confidence is high with sufficient data

# Better:
confidence=min(0.9, 0.5 + (sample_count / 30) * 0.4)  # Scale with sample size

4. Potential Division by Zero (src/polar_flow_server/services/pattern.py:463-465)

While there's a check for baseline_avg == 0, what about negative values? HRV should never be negative, but defensive coding suggests:

if baseline_avg <= 0:
    return None

5. Missing API Guard Configuration (src/polar_flow_server/api/patterns.py:178)

The patterns router uses per_user_api_key_guard, but from the API docs, it's not clear if API keys are required. The CLAUDE.md says "Optional API key via X-API-Key header". If truly optional, this guard will block unauthenticated requests in default configs.

Verify: Is this intentional? If API keys are optional for other endpoints, why require them for patterns?

6. Inefficient Trend Calculation (src/polar_flow_server/services/pattern.py:449-466)

The _get_metric_trend loads 30 days of data every time but only needs 7 recent + 7-23 baseline days minimum. This could be optimized with two separate queries:

# Query 1: Recent 7 days
# Query 2: Days 8-30 for baseline

This matters for users with sparse data.

Medium Priority

7. Inconsistent Sample Count (src/polar_flow_server/services/pattern.py:395)

The _detect_metric_trend returns sample_count=7 regardless of actual data points. Should be:

sample_count=len(recent),  # Actual count

8. Magic Numbers (Multiple locations)

Consider extracting these to class constants:

MIN_SAMPLES_CORRELATION = 21 ✓ (already done)
MIN_SAMPLES_TREND = 7 ✓ (already done)
Risk thresholds: 10%, 5%, 1.5, 1.3 (lines 252-290)
Correlation strength: 0.7, 0.4, 0.2 (lines 189-196)
Significance thresholds: 50, 25 (lines 295-300)

9. Missing Type Hints for Details Dict (src/polar_flow_server/services/pattern.py:41)

details: dict[str, object] | None = None  # Too generic

Consider a TypedDict for better type checking:

class CorrelationDetails(TypedDict, total=False):
    correlation_coefficient: float
    p_value: float
    strength: str
    direction: str
    interpretation: str

10. Unused Baseline Model Import (src/polar_flow_server/services/pattern.py:14)

from polar_flow_server.models.baseline import UserBaseline

This import is never used in the PatternService. It's only used by AnomalyService. Clean imports.

11. SQL Query Efficiency (src/polar_flow_server/services/pattern.py:672-737)

The _get_latest_metric_values runs 5 separate queries. Consider a more efficient approach:

# Use CTEs or subqueries to fetch all latest values in 2-3 queries max

With appropriate indexes, this could be significantly faster for large datasets.

Low Priority / Style

12. Inconsistent Docstring Style

Most docstrings are excellent, but some are missing return type descriptions (e.g., _get_recovery_recommendations:489).

13. Test Coverage - Edge Cases

Missing test cases for:

What happens when all sleep scores are identical? (zero variance)
Negative correlation interpretation
Exactly 21 samples (boundary case)
Data with gaps (missing dates in the middle)

14. API Response - Missing HTTP 404

get_pattern_by_name returns None for invalid pattern names (line 96), which becomes HTTP 200 with null body. Should return 404:

from litestar.exceptions import NotFoundException

if not pattern:
    raise NotFoundException("Pattern not found")

15. Validation Function Placement (src/polar_flow_server/api/patterns.py:19-35)

The validate_user_id function is duplicated across API modules (I assume). Consider moving to a shared validation module:

# src/polar_flow_server/api/validation.py

Positive Highlights

✅ Excellent Statistical Approach: Using Spearman correlation over Pearson is the right choice for health metrics
✅ Robust NaN Handling: Proper checks for NaN results from scipy (line 164)
✅ Clear Significance Thresholds: Well-documented p-value cutoffs
✅ Comprehensive Tests: 13 new tests with good coverage of happy paths
✅ IQR-Based Anomaly Detection: Robust choice for non-normal distributions
✅ Multi-Tenancy Support: Proper use of UserScopedMixin
✅ Idempotent Upserts: ON CONFLICT DO UPDATE pattern
✅ Clear Recommendations: Actionable recovery advice based on risk scores

Migration Review

The migration looks good overall:

✅ Proper UUID primary key
✅ Indexes on query paths
✅ JSON columns for flexibility
✅ Timezone-aware timestamps
✅ Helpful column comments

Minor suggestion: Consider adding an index on analyzed_at for time-based queries:

op.create_index('ix_pattern_analyzed_at', 'pattern_analyses', ['analyzed_at'])

Documentation

The API documentation is thorough and includes:

✅ Clear endpoint descriptions
✅ Pattern type explanations
✅ Example requests/responses
✅ Significance level descriptions

Recommendations

Before Merge:

Fix issue feat: Complete V3 API Integration with HTMX Admin Panel #2 (missing rollback)
Address issue feat: API key authentication and Alembic migrations #5 (API guard clarification)
Fix issue feat: add GitHub templates (CODEOWNERS, PR template, issue templates) #14 (404 for missing patterns)

Post-Merge (Technical Debt):

Optimize queries (issues feat: Admin authentication and dynamic OAuth URLs #6, feat: v1.0.0 - Unified Insights API (Sprint 3) #11)
Extract magic numbers to constants
Add edge case tests
Create shared validation module

Summary

This is a solid implementation with good statistical foundations and thorough testing. The main concerns are around error handling and some API design decisions. Once the critical issues are addressed, this will be ready to merge.

Verdict: Approve with changes requested for issues #2, #5, and #14.

| Branch: feature/sprint-2-trends

- Add try/except with rollback to detect_all_patterns - Return 404 for invalid/missing patterns instead of null - API guard verified as consistent with all other endpoints Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

StuMason and others added 3 commits January 13, 2026 06:14

fix: Address re-review issues

79ab222

- Add try/except with rollback to detect_all_patterns - Return 404 for invalid/missing patterns instead of null - API guard verified as consistent with all other endpoints Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

StuMason merged commit f21a849 into main Jan 13, 2026
5 checks passed

StuMason deleted the feature/sprint-2-trends branch January 13, 2026 06:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Pattern Detection & Anomaly Scanning (Sprint 2)#10

feat: Pattern Detection & Anomaly Scanning (Sprint 2)#10
StuMason merged 4 commits intomainfrom
feature/sprint-2-trends

StuMason commented Jan 13, 2026

Uh oh!

StuMason commented Jan 13, 2026

Uh oh!

claude bot commented Jan 13, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

StuMason commented Jan 13, 2026

Summary

New Endpoints

Pattern Types

Test plan

Uh oh!

StuMason commented Jan 13, 2026

Uh oh!

claude bot commented Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Re-reviewing PR #10

Tasks

Code Review Summary

Critical Issues

1. SQL Injection Risk in Migration (src/polar_flow_server/services/pattern.py:542-548)

2. Missing Session Rollback on Error (src/polar_flow_server/services/pattern.py:64-99)

High Priority

3. Hardcoded Confidence Value (src/polar_flow_server/services/pattern.py:393)

4. Potential Division by Zero (src/polar_flow_server/services/pattern.py:463-465)

5. Missing API Guard Configuration (src/polar_flow_server/api/patterns.py:178)

6. Inefficient Trend Calculation (src/polar_flow_server/services/pattern.py:449-466)

Medium Priority

7. Inconsistent Sample Count (src/polar_flow_server/services/pattern.py:395)

8. Magic Numbers (Multiple locations)

9. Missing Type Hints for Details Dict (src/polar_flow_server/services/pattern.py:41)

10. Unused Baseline Model Import (src/polar_flow_server/services/pattern.py:14)

11. SQL Query Efficiency (src/polar_flow_server/services/pattern.py:672-737)

Low Priority / Style

12. Inconsistent Docstring Style

13. Test Coverage - Edge Cases

14. API Response - Missing HTTP 404

15. Validation Function Placement (src/polar_flow_server/api/patterns.py:19-35)

Positive Highlights

Migration Review

Documentation

Recommendations

Summary

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

claude bot commented Jan 13, 2026 •

edited

Loading