Skip to content

feat(predict): AR features, multi-target diagnostic, and CLI#23

Open
TimeToLearnAlice wants to merge 7 commits intoErikBjare:masterfrom
TimeToLearnAlice:feat/predict-ar-features-cli
Open

feat(predict): AR features, multi-target diagnostic, and CLI#23
TimeToLearnAlice wants to merge 7 commits intoErikBjare:masterfrom
TimeToLearnAlice:feat/predict-ar-features-cli

Conversation

@TimeToLearnAlice
Copy link
Copy Markdown
Contributor

@TimeToLearnAlice TimeToLearnAlice commented Apr 13, 2026

Summary

Phase 1 improvements to the predictive QS framework (building on #22):

  • Autoregressive features: Target variable's own lags (d-1, d-2, d-3, d-7) and rolling stats (7d/14d mean, 7d std) — the target's own recent history is often the strongest predictor for habitual behaviors
  • Multi-target diagnostic (run_diagnostic()): Compare baseline performance across all viable screen time categories in one run, with cross-target feature importance aggregation
  • CLI module (python -m quantifiedme.predict.cli): Three subcommands — baseline, diagnostic, features — for convenient exploration

Results (730 days, 88 features)

Target Test R² Test RMSE Notes
time:Uncategorized 0.127 1.356 Best — habitual/autoregressive
time:Work 0.034 1.445 Mild improvement from AR features
time:Programming 0.017 1.400 Weak signal
time:Media -2.385 0.726 Overfits — bursty target

Top cross-target features: day-of-week (dominant), cannabinoids trailing count, AR 7d_std, social media lag, programming volatility.

Key insight: With only daily binary substance data (no dosages/timestamps), substance kernels provide modest signal. The model's weakness is expected — this baseline establishes the floor for comparison with the Bayesian model in Phase 2.

Test plan

Summary by Sourcery

Add multi-target diagnostics, autoregressive features, and a command-line interface to the predictive baseline framework.

New Features:

  • Introduce a multi-target diagnostic runner that evaluates baseline models across multiple screen time targets and aggregates feature importance.
  • Add autoregressive features based on the target variable’s recent history into the feature frame used by the baseline model.
  • Provide a CLI module with subcommands to run single-target baselines, multi-target diagnostics, and feature inspection from the command line.

Enhancements:

  • Extend the baseline reporting to summarize per-target metrics and highlight cross-target important features.

- Add autoregressive features (target lags, rolling mean/std) to feature frame
  for stronger baseline signal — AR features capture habitual patterns
- Add run_diagnostic() for multi-target comparison across screen time categories
- Add CLI module (quantifiedme.predict.cli) with baseline, diagnostic, and
  features inspection subcommands
- Results: time:Uncategorized R²=0.127 (best), time:Work R²=0.034;
  day-of-week dominates feature importance across all targets
@sourcery-ai
Copy link
Copy Markdown

sourcery-ai Bot commented Apr 13, 2026

Reviewer's Guide

Adds autoregressive (AR) features to the baseline predictor, introduces a multi-target diagnostic runner with cross-target feature importance aggregation, and exposes baseline, diagnostic, and feature inspection flows via a new CLI module.

Sequence diagram for CLI baseline command flow

sequenceDiagram
    actor User
    participant Shell
    participant Module_predict as quantifiedme_predict
    participant CLI as cli_main
    participant Baseline as baseline
    participant Features as features
    participant Model as train_baseline

    User->>Shell: python -m quantifiedme.predict baseline data.csv --target time:Work
    Shell->>Module_predict: locate module quantifiedme.predict
    Module_predict->>CLI: call main(argv)
    CLI->>CLI: argparse parse_args
    CLI->>CLI: select subcommand baseline
    CLI->>CLI: cmd_baseline(args)

    CLI->>Baseline: run_baseline(csv_path=args.csv, target_col=args.target)
    Baseline->>Baseline: load_csv_export(csv_path)
    Baseline->>Features: build_feature_frame(df, target_col)
    Features-->>Baseline: X, y
    Baseline->>Model: train_baseline(df, target_col)
    Model-->>Baseline: BaselineResult
    Baseline-->>CLI: BaselineResult
    CLI-->>User: print result.summary()
Loading

Class diagram for predictive baseline, features, and CLI modules

classDiagram
    class baseline {
        +run_baseline(csv_path, target_col)
        +run_diagnostic(csv_path, targets)
        +load_csv_export(csv_path)
        +train_baseline(df, target_col)
    }

    class BaselineResult {
        +float train_r2
        +float test_r2
        +float test_rmse
        +float test_mae
        +float target_mean
        +float target_std
        +int n_train
        +int n_val
        +int n_test
        +feature_importance
    }

    class features {
        +build_feature_frame(df, target_col, lag_days, require_complete_rows)
        +build_temporal_features(df)
        +build_screentime_features(df, lag_days)
        +build_substance_features(df, lag_days)
        +build_autoregressive_features(df, target_col, lags)
    }

    class cli {
        +main(argv)
        +cmd_baseline(args)
        +cmd_diagnostic(args)
        +cmd_features(args)
    }

    baseline --> BaselineResult : returns
    baseline --> features : uses
    features --> BaselineResult : populates feature_importance

    cli --> baseline : imports_run_baseline
    cli --> baseline : imports_run_diagnostic
    cli --> baseline : imports_load_csv_export
    cli --> features : imports_build_feature_frame
Loading

Flow diagram for multi-target diagnostic with cross-target feature importance

flowchart TD
    A[start run_diagnostic] --> B[load_csv_export csv_path]
    B --> C{targets is None?}
    C -- Yes --> D[select candidates time:* columns]
    D --> E[filter by std > 0.1 and mean > 0.05]
    E --> F[sort by descending mean]
    F --> G[keep top 8 targets]
    C -- No --> H[use provided targets]
    G --> I[initialize results dict]
    H --> I[initialize results dict]

    I --> J[print diagnostic header]
    J --> K{for each target in targets}

    K --> L[try train_baseline df, target]
    L --> M[store BaselineResult in results]
    M --> N[print per-target metrics]
    N --> K

    K --> O[except Exception]
    O --> P[logger warning Failed for target]
    P --> K

    K --> Q{results not empty?}
    Q -- No --> R[return results]
    Q -- Yes --> S[initialize all_importances dict]

    S --> T[for each BaselineResult in results]
    T --> U[for top 10 features in feature_importance]
    U --> V[accumulate importance per feature]
    V --> T
    T --> W[sort features by total importance]
    W --> X[print top 15 features with aggregated importance]
    X --> R[return results]
Loading

File-Level Changes

Change Details Files
Add multi-target baseline diagnostic runner with auto target selection and cross-target feature importance aggregation.
  • Implement run_diagnostic to load CSV data and optionally auto-detect viable time:* targets based on variance and mean thresholds.
  • Run train_baseline per target, collecting BaselineResult objects and printing a tabular per-target metrics summary.
  • Aggregate top-10 feature importances across all successful targets and print the most important cross-target features.
  • Handle per-target training failures gracefully via warnings and return the results mapping.
src/quantifiedme/predict/baseline.py
Introduce autoregressive feature construction for the target and integrate into the feature frame.
  • Implement build_autoregressive_features to create lagged target values and rolling statistics with configurable lags and safe handling when the target column is missing.
  • Name AR features with an ar: prefix and target-name normalization to avoid colon collisions.
  • Call build_autoregressive_features from build_feature_frame and append AR features to the combined feature matrix alongside existing feature groups.
src/quantifiedme/predict/features.py
Provide a CLI entry point for running baselines, diagnostics, and feature inspections from the command line.
  • Add cli.py with argparse-based subcommands baseline, diagnostic, and features under the quantifiedme.predict module.
  • Wire baseline subcommand to run_baseline with CSV path and optional target argument.
  • Wire diagnostic subcommand to run_diagnostic with CSV path and optional comma-separated targets list.
  • Wire features subcommand to build_feature_frame, print dataset/feature summary, and optionally show per-feature statistics grouped by prefix.
  • Implement a main() dispatcher compatible with python -m quantifiedme.predict.cli and guard it with main check.
src/quantifiedme/predict/cli.py

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Copy Markdown

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 2 issues, and left some high level feedback:

  • In run_diagnostic, consider avoiding a blanket except Exception (or at least logging the full stack trace) so that unexpected errors are easier to debug rather than being silently swallowed per target.
  • In build_autoregressive_features, returning an empty DataFrame when target_col is missing may hide configuration issues; it could be safer to raise or log a warning so the caller knows AR features were not created.
  • For the CLI --targets argument, it may be helpful to strip whitespace around comma-separated target names (args.targets.split(',')) to be more forgiving of user input like "time:Work, time:Media".
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `run_diagnostic`, consider avoiding a blanket `except Exception` (or at least logging the full stack trace) so that unexpected errors are easier to debug rather than being silently swallowed per target.
- In `build_autoregressive_features`, returning an empty DataFrame when `target_col` is missing may hide configuration issues; it could be safer to raise or log a warning so the caller knows AR features were not created.
- For the CLI `--targets` argument, it may be helpful to strip whitespace around comma-separated target names (`args.targets.split(',')`) to be more forgiving of user input like `"time:Work, time:Media"`.

## Individual Comments

### Comment 1
<location path="src/quantifiedme/predict/cli.py" line_range="26" />
<code_context>
+    """Run multi-target diagnostic comparison."""
+    from .baseline import run_diagnostic
+
+    targets = args.targets.split(",") if args.targets else None
+    run_diagnostic(args.csv, targets=targets)
+
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Strip whitespace from comma-separated targets to avoid mismatches with column names.

If a user passes `--targets` with spaces (e.g. `"time:Work, time:Play"`), entries after the comma will include a leading space and likely won’t match any column. Consider normalizing with:

```python
targets = [t.strip() for t in args.targets.split(',')] if args.targets else None
```

Suggested implementation:

```python
def cmd_diagnostic(args: argparse.Namespace) -> None:
    """Run multi-target diagnostic comparison."""
    from .baseline import run_diagnostic

```

```python
    targets = [t.strip() for t in args.targets.split(",")] if args.targets else None
    run_diagnostic(args.csv, targets=targets)

```
</issue_to_address>

### Comment 2
<location path="src/quantifiedme/predict/baseline.py" line_range="239-240" />
<code_context>
+                f"{result.target_mean:>8.2f} {result.target_std:>8.2f} "
+                f"{result.n_train + result.n_val + result.n_test:>6}"
+            )
+        except Exception as e:
+            logger.warning(f"Failed for {target}: {e}")
+
+    # Summary: which features appear most across targets
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Avoid a broad `Exception` catch or at least surface more debugging information.

This broad `Exception` catch can mask programming errors (e.g., incorrect `BaselineResult` attributes) as mere warnings. Please either catch only the expected runtime errors (e.g., data/shape issues) or re-raise after logging unexpected ones. If you keep this pattern, at least log the traceback (`exc_info=True`) so failures remain diagnosable.

```suggestion
        except Exception:
            logger.warning("Failed for %s", target, exc_info=True)
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread src/quantifiedme/predict/cli.py Outdated
"""Run multi-target diagnostic comparison."""
from .baseline import run_diagnostic

targets = args.targets.split(",") if args.targets else None
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (bug_risk): Strip whitespace from comma-separated targets to avoid mismatches with column names.

If a user passes --targets with spaces (e.g. "time:Work, time:Play"), entries after the comma will include a leading space and likely won’t match any column. Consider normalizing with:

targets = [t.strip() for t in args.targets.split(',')] if args.targets else None

Suggested implementation:

def cmd_diagnostic(args: argparse.Namespace) -> None:
    """Run multi-target diagnostic comparison."""
    from .baseline import run_diagnostic
    targets = [t.strip() for t in args.targets.split(",")] if args.targets else None
    run_diagnostic(args.csv, targets=targets)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already addressed in prior commit — line 31 uses t.strip().

Comment thread src/quantifiedme/predict/baseline.py Outdated
Comment on lines +239 to +240
except Exception as e:
logger.warning(f"Failed for {target}: {e}")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (bug_risk): Avoid a broad Exception catch or at least surface more debugging information.

This broad Exception catch can mask programming errors (e.g., incorrect BaselineResult attributes) as mere warnings. Please either catch only the expected runtime errors (e.g., data/shape issues) or re-raise after logging unexpected ones. If you keep this pattern, at least log the traceback (exc_info=True) so failures remain diagnosable.

Suggested change
except Exception as e:
logger.warning(f"Failed for {target}: {e}")
except Exception:
logger.warning("Failed for %s", target, exc_info=True)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Narrowed to (ValueError, KeyError) in 25f2b79. Kept exc_info=True for traceability.

@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented Apr 13, 2026

Greptile Summary

This PR extends the predictive QS framework with autoregressive features for the target variable, a multi-target run_diagnostic() helper, and a new CLI module. The temporal split, early-stopping discipline, and collision-detection logic from #22 are preserved, and AR lag alignment (lag-1 = yesterday, target = tomorrow) is correct with no data leakage.

Confidence Score: 5/5

Safe to merge; all findings are P2 quality/style suggestions with no correctness or data-integrity impact.

No P0 or P1 issues found. The CLI docstring, duplicate feature redundancy, missing lags forwarding, and broad exception catch are all non-blocking improvement suggestions that don't affect model correctness or runtime behavior.

No files require special attention for merge safety.

Important Files Changed

Filename Overview
src/quantifiedme/predict/features.py Adds build_autoregressive_features and integrates it into build_feature_frame; introduces duplicate features when target is in key_categories and lag_days is not forwarded to the AR builder.
src/quantifiedme/predict/baseline.py Adds run_diagnostic for multi-target comparison; broad exception swallowing in diagnostic loop could silently hide real failures.
src/quantifiedme/predict/cli.py New CLI module; module-level docstring documents python -m quantifiedme.predict which won't work without a __main__.py — correct invocation is python -m quantifiedme.predict.cli.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Raw DataFrame\ntime:* + tag:* columns] --> B[build_substance_features\ndecay kernels, trailing counts]
    A --> C[build_screentime_features\nlag d-1/2/3/7 + roll 7d mean/std]
    A --> D[build_temporal_features\ndow, weekend, month, cyclical]
    A --> E[build_autoregressive_features\ntarget lags d-1/2/3/7\nroll 7d/14d mean, 7d std]
    B --> F[pd.concat → X]
    C --> F
    D --> F
    E --> F
    A --> G["y = target.shift(-1)\nnext-day value"]
    F --> H[valid mask\ndrop NaN rows]
    G --> H
    H --> I[X_train / X_val / X_test\ntime-based 60/20/20 split]
    I --> J[LightGBM train\nearly stopping on val set]
    J --> K[BaselineResult\nmetrics + feature importance]
    style E fill:#d4edda,stroke:#28a745
    style K fill:#cce5ff,stroke:#004085
Loading

Comments Outside Diff (1)

  1. src/quantifiedme/predict/features.py, line 251-260 (link)

    P2 Duplicate features when target is a key_category

    When target_col is one of the hard-coded key_categories (e.g. "time:Work"), build_screentime_features and build_autoregressive_features produce identically-valued columns under different names. For example, with target_col="time:Work":

    • lag:Work:d-1 and ar:time_Work:d-1 both equal df["time:Work"].shift(1)
    • roll:Work:7d_mean and ar:time_Work:7d_mean both equal df["time:Work"].rolling(7).mean()

    The collision check in train_baseline won't catch this because the sanitized names differ. LightGBM will split importance across both features, making the importance report misleading. Consider skipping AR features for columns already covered by build_screentime_features, or removing those columns from key_categories when they are the target.

Reviews (1): Last reviewed commit: "feat(predict): add AR features, multi-ta..." | Re-trigger Greptile

Comment on lines +3 to +8
Usage:
python -m quantifiedme.predict baseline data.csv
python -m quantifiedme.predict baseline data.csv --target time:Programming
python -m quantifiedme.predict diagnostic data.csv
python -m quantifiedme.predict features data.csv
"""
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Incorrect module invocation in docstring

The docstring shows python -m quantifiedme.predict baseline data.csv, but there is no __main__.py in the predict package. Running that command will raise No module named quantifiedme.predict.__main__. The working command is python -m quantifiedme.predict.cli baseline data.csv (as the PR description correctly states).

Either fix the docstring or add a __main__.py that delegates to cli.main().

Suggested change
Usage:
python -m quantifiedme.predict baseline data.csv
python -m quantifiedme.predict baseline data.csv --target time:Programming
python -m quantifiedme.predict diagnostic data.csv
python -m quantifiedme.predict features data.csv
"""
"""CLI for the predictive QS framework.
Usage:
python -m quantifiedme.predict.cli baseline data.csv
python -m quantifiedme.predict.cli baseline data.csv --target time:Programming
python -m quantifiedme.predict.cli diagnostic data.csv
python -m quantifiedme.predict.cli features data.csv
"""

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

__main__.py already exists — python -m quantifiedme.predict works correctly. No change needed.

Comment thread src/quantifiedme/predict/features.py Outdated
)
screentime_features = build_screentime_features(df, lag_days=lag_days)
temporal_features = build_temporal_features(df)
ar_features = build_autoregressive_features(df, target_col)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 lag_days not forwarded to AR feature builder

build_feature_frame accepts a lag_days parameter and passes it to build_screentime_features, but calls build_autoregressive_features(df, target_col) without it. If a caller passes a custom lag_days (e.g. [1, 14, 30]), the AR lags will silently remain at the default [1, 2, 3, 7], producing an inconsistency between the two lag feature sets.

Suggested change
ar_features = build_autoregressive_features(df, target_col)
ar_features = build_autoregressive_features(df, target_col, lags=lag_days)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — fixed in 25f2b79. lag_days now forwarded to build_autoregressive_features.

Comment thread src/quantifiedme/predict/baseline.py Outdated
Comment on lines +239 to +240
except Exception as e:
logger.warning(f"Failed for {target}: {e}")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Broad exception swallowing in diagnostic loop

Catching all Exception types and logging a warning means failures like a missing target column, a pandas shape mismatch, or a LightGBM error will be silently skipped. Consider narrowing the catch or at least re-raising unexpected errors so data pipeline problems don't go undetected during a diagnostic run.

Suggested change
except Exception as e:
logger.warning(f"Failed for {target}: {e}")
except (ValueError, KeyError) as e:
logger.warning(f"Failed for {target}: {e}")

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Narrowed to (ValueError, KeyError) in 25f2b79.

PyMC-based linear model with regularized priors for predicting
next-day work hours from substance decay kernels, temporal patterns,
and autoregressive features. Outputs posterior predictive distributions
with credible intervals rather than point estimates.

- models/work.py: Bayesian model with feature selection, standardization,
  manual out-of-sample posterior predictive computation
- cli.py: `bayesian` subcommand with --samples, --tune, --max-features
- __main__.py: enable `python -m quantifiedme.predict` execution

Initial results on 730-day AW+QSlang dataset:
- time:Work R²=0.057 (test), 94% CI coverage=91.7%
- Top predictors: day-of-week (cyclical), psychedelics:count_7d,
  Programming:7d_std, alcohol:kernel (negative)
- Log full stack trace in diagnostic exception handler (was swallowing)
- Warn when AR target column is missing instead of silent empty return
- Strip whitespace from comma-separated --targets input
Implements simulate.py with intervention mapping that translates
human-readable substance additions/removals into feature vector
modifications for the Bayesian model. Adds CLI command:

  python -m quantifiedme.predict simulate data.csv --add caffeine
  python -m quantifiedme.predict simulate data.csv --remove alcohol --add nicotine

Uses query_intervention() from the existing work model to compare
baseline vs intervention posterior predictive distributions, reporting
effect size, credible intervals, and P(positive effect).
Cast pandas .values (which returns ndarray | ExtensionArray) to
np.ndarray via np.asarray() for type-safe numpy operations. Also
fix Hashable → str index type for feature importance dict.
When target_col is in key_categories (e.g. time:Work), both
build_screentime_features and build_autoregressive_features produced
identical lag/rolling columns under different names, splitting
feature importance and misleading the model.

Now build_screentime_features excludes the target column, letting
AR features (which include additional 14d_mean) handle it exclusively.

Addresses Greptile P2 review feedback on PR ErikBjare#23.
- Forward lag_days to build_autoregressive_features (Greptile P2)
- Narrow diagnostic exception catch to ValueError/KeyError (Sourcery + Greptile P2)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant