Skip to content

feat: Pattern Detection & Anomaly Scanning (Sprint 2)#10

Merged
StuMason merged 4 commits intomainfrom
feature/sprint-2-trends
Jan 13, 2026
Merged

feat: Pattern Detection & Anomaly Scanning (Sprint 2)#10
StuMason merged 4 commits intomainfrom
feature/sprint-2-trends

Conversation

@StuMason
Copy link
Copy Markdown
Owner

Summary

  • Add pattern detection service with Spearman correlation analysis
  • Implement overtraining risk scoring (multi-metric composite 0-100)
  • Add trend analysis (7-day vs 30-day baseline comparison)
  • Create bulk anomaly scanning across all metrics
  • Add comprehensive API documentation for patterns

New Endpoints

Method Endpoint Description
GET /users/{id}/patterns List all detected patterns
GET /users/{id}/patterns/{name} Get specific pattern
POST /users/{id}/patterns/detect Trigger pattern detection
GET /users/{id}/anomalies Bulk scan for anomalies

Pattern Types

  • sleep_hrv_correlation - Spearman correlation with p-value confidence
  • overtraining_risk - Multi-metric composite score with recovery recommendations
  • hrv_trend - 7-day vs 30-day HRV baseline comparison
  • sleep_trend - 7-day vs 30-day sleep score comparison

Test plan

  • 13 new tests for pattern detection (all passing)
  • 58 total tests passing
  • mypy strict mode passes
  • ruff linting passes
  • Manual API testing with real data

🤖 Generated with Claude Code

StuMason and others added 3 commits January 13, 2026 06:14
- Document all 5 baselines endpoints with examples
- Explain IQR-based anomaly detection (warning/critical)
- Document feature unlock timeline (7/14/21/30/60/90 days)
- Update index.md with analytics overview and examples

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add pattern detection service with Spearman correlation, overtraining
risk scoring, and trend analysis. Includes bulk anomaly scanning across
all metrics.

New endpoints:
- GET /users/{id}/patterns - List all detected patterns
- GET /users/{id}/patterns/{name} - Get specific pattern
- POST /users/{id}/patterns/detect - Trigger pattern detection
- GET /users/{id}/anomalies - Bulk scan for anomalies

Pattern types:
- sleep_hrv_correlation: Spearman correlation with p-value confidence
- overtraining_risk: Multi-metric composite score (0-100)
- hrv_trend: 7-day vs 30-day baseline comparison
- sleep_trend: 7-day vs 30-day baseline comparison

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Fixes from code review:
- Fix baseline calculation to exclude recent 7 days from baseline
  (was including recent data which diluted trend sensitivity)
- Add NaN validation for spearmanr results (handles constant arrays)
- Use explicit lambda for metrics_involved default in model
- Fix type safety in interpretation property (check isinstance)
- Pin scipy version to >=1.14.0,<2.0.0
- Add comment clarifying migration ID generation approach

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@StuMason
Copy link
Copy Markdown
Owner Author

@claude please re-review

@claude
Copy link
Copy Markdown

claude bot commented Jan 13, 2026

Claude finished @StuMason's task in 1m 35s —— View job


Re-reviewing PR #10

Tasks

  • Read pattern service implementation
  • Read pattern API endpoints
  • Read pattern model
  • Review tests
  • Check migration
  • Review documentation
  • Provide comprehensive feedback

Code Review Summary

Great work on the pattern detection implementation! The code is well-structured, thoroughly tested, and follows the project's architectural patterns. I found several issues that should be addressed before merging.


Critical Issues

1. SQL Injection Risk in Migration (src/polar_flow_server/services/pattern.py:542-548)

The upsert uses string-based constraint name which could be vulnerable. While the current implementation is safe, consider using the ORM constraint object for better type safety:

stmt = stmt.on_conflict_do_update(
    index_elements=['user_id', 'pattern_type', 'pattern_name'],
    set_=pattern_data,
)

This is more explicit and prevents potential issues if the constraint name changes.

2. Missing Session Rollback on Error (src/polar_flow_server/services/pattern.py:64-99)

The detect_all_patterns method commits at the end but doesn't handle rollback on exceptions. If one pattern detection fails, you'll have partial data:

try:
    # ... pattern detection ...
    await self.session.commit()
except Exception as e:
    await self.session.rollback()
    self.logger.error("Pattern detection failed", error=str(e))
    raise

Fix this →


High Priority

3. Hardcoded Confidence Value (src/polar_flow_server/services/pattern.py:393)

The trend confidence is hardcoded at 0.8. This should be calculated based on sample size or data quality:

# Current:
confidence=0.8,  # Trend confidence is high with sufficient data

# Better:
confidence=min(0.9, 0.5 + (sample_count / 30) * 0.4)  # Scale with sample size

4. Potential Division by Zero (src/polar_flow_server/services/pattern.py:463-465)

While there's a check for baseline_avg == 0, what about negative values? HRV should never be negative, but defensive coding suggests:

if baseline_avg <= 0:
    return None

5. Missing API Guard Configuration (src/polar_flow_server/api/patterns.py:178)

The patterns router uses per_user_api_key_guard, but from the API docs, it's not clear if API keys are required. The CLAUDE.md says "Optional API key via X-API-Key header". If truly optional, this guard will block unauthenticated requests in default configs.

Verify: Is this intentional? If API keys are optional for other endpoints, why require them for patterns?

6. Inefficient Trend Calculation (src/polar_flow_server/services/pattern.py:449-466)

The _get_metric_trend loads 30 days of data every time but only needs 7 recent + 7-23 baseline days minimum. This could be optimized with two separate queries:

# Query 1: Recent 7 days
# Query 2: Days 8-30 for baseline

This matters for users with sparse data.


Medium Priority

7. Inconsistent Sample Count (src/polar_flow_server/services/pattern.py:395)

The _detect_metric_trend returns sample_count=7 regardless of actual data points. Should be:

sample_count=len(recent),  # Actual count

8. Magic Numbers (Multiple locations)

Consider extracting these to class constants:

  • MIN_SAMPLES_CORRELATION = 21 ✓ (already done)
  • MIN_SAMPLES_TREND = 7 ✓ (already done)
  • Risk thresholds: 10%, 5%, 1.5, 1.3 (lines 252-290)
  • Correlation strength: 0.7, 0.4, 0.2 (lines 189-196)
  • Significance thresholds: 50, 25 (lines 295-300)

9. Missing Type Hints for Details Dict (src/polar_flow_server/services/pattern.py:41)

details: dict[str, object] | None = None  # Too generic

Consider a TypedDict for better type checking:

class CorrelationDetails(TypedDict, total=False):
    correlation_coefficient: float
    p_value: float
    strength: str
    direction: str
    interpretation: str

10. Unused Baseline Model Import (src/polar_flow_server/services/pattern.py:14)

from polar_flow_server.models.baseline import UserBaseline

This import is never used in the PatternService. It's only used by AnomalyService. Clean imports.

11. SQL Query Efficiency (src/polar_flow_server/services/pattern.py:672-737)

The _get_latest_metric_values runs 5 separate queries. Consider a more efficient approach:

# Use CTEs or subqueries to fetch all latest values in 2-3 queries max

With appropriate indexes, this could be significantly faster for large datasets.


Low Priority / Style

12. Inconsistent Docstring Style

Most docstrings are excellent, but some are missing return type descriptions (e.g., _get_recovery_recommendations:489).

13. Test Coverage - Edge Cases

Missing test cases for:

  • What happens when all sleep scores are identical? (zero variance)
  • Negative correlation interpretation
  • Exactly 21 samples (boundary case)
  • Data with gaps (missing dates in the middle)

14. API Response - Missing HTTP 404

get_pattern_by_name returns None for invalid pattern names (line 96), which becomes HTTP 200 with null body. Should return 404:

from litestar.exceptions import NotFoundException

if not pattern:
    raise NotFoundException("Pattern not found")

15. Validation Function Placement (src/polar_flow_server/api/patterns.py:19-35)

The validate_user_id function is duplicated across API modules (I assume). Consider moving to a shared validation module:

# src/polar_flow_server/api/validation.py

Positive Highlights

Excellent Statistical Approach: Using Spearman correlation over Pearson is the right choice for health metrics
Robust NaN Handling: Proper checks for NaN results from scipy (line 164)
Clear Significance Thresholds: Well-documented p-value cutoffs
Comprehensive Tests: 13 new tests with good coverage of happy paths
IQR-Based Anomaly Detection: Robust choice for non-normal distributions
Multi-Tenancy Support: Proper use of UserScopedMixin
Idempotent Upserts: ON CONFLICT DO UPDATE pattern
Clear Recommendations: Actionable recovery advice based on risk scores


Migration Review

The migration looks good overall:

  • ✅ Proper UUID primary key
  • ✅ Indexes on query paths
  • ✅ JSON columns for flexibility
  • ✅ Timezone-aware timestamps
  • ✅ Helpful column comments

Minor suggestion: Consider adding an index on analyzed_at for time-based queries:

op.create_index('ix_pattern_analyzed_at', 'pattern_analyses', ['analyzed_at'])

Documentation

The API documentation is thorough and includes:

  • ✅ Clear endpoint descriptions
  • ✅ Pattern type explanations
  • ✅ Example requests/responses
  • ✅ Significance level descriptions

Recommendations

Before Merge:

  1. Fix issue feat: Complete V3 API Integration with HTMX Admin Panel #2 (missing rollback)
  2. Address issue feat: API key authentication and Alembic migrations #5 (API guard clarification)
  3. Fix issue feat: add GitHub templates (CODEOWNERS, PR template, issue templates) #14 (404 for missing patterns)

Post-Merge (Technical Debt):

  1. Optimize queries (issues feat: Admin authentication and dynamic OAuth URLs #6, feat: v1.0.0 - Unified Insights API (Sprint 3) #11)
  2. Extract magic numbers to constants
  3. Add edge case tests
  4. Create shared validation module

Summary

This is a solid implementation with good statistical foundations and thorough testing. The main concerns are around error handling and some API design decisions. Once the critical issues are addressed, this will be ready to merge.

Verdict: Approve with changes requested for issues #2, #5, and #14.


| Branch: feature/sprint-2-trends

- Add try/except with rollback to detect_all_patterns
- Return 404 for invalid/missing patterns instead of null
- API guard verified as consistent with all other endpoints

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@StuMason StuMason merged commit f21a849 into main Jan 13, 2026
5 checks passed
@StuMason StuMason deleted the feature/sprint-2-trends branch January 13, 2026 06:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant