feat(predict): AR features, multi-target diagnostic, and CLI#23
feat(predict): AR features, multi-target diagnostic, and CLI#23TimeToLearnAlice wants to merge 7 commits intoErikBjare:masterfrom
Conversation
- Add autoregressive features (target lags, rolling mean/std) to feature frame for stronger baseline signal — AR features capture habitual patterns - Add run_diagnostic() for multi-target comparison across screen time categories - Add CLI module (quantifiedme.predict.cli) with baseline, diagnostic, and features inspection subcommands - Results: time:Uncategorized R²=0.127 (best), time:Work R²=0.034; day-of-week dominates feature importance across all targets
Reviewer's GuideAdds autoregressive (AR) features to the baseline predictor, introduces a multi-target diagnostic runner with cross-target feature importance aggregation, and exposes baseline, diagnostic, and feature inspection flows via a new CLI module. Sequence diagram for CLI baseline command flowsequenceDiagram
actor User
participant Shell
participant Module_predict as quantifiedme_predict
participant CLI as cli_main
participant Baseline as baseline
participant Features as features
participant Model as train_baseline
User->>Shell: python -m quantifiedme.predict baseline data.csv --target time:Work
Shell->>Module_predict: locate module quantifiedme.predict
Module_predict->>CLI: call main(argv)
CLI->>CLI: argparse parse_args
CLI->>CLI: select subcommand baseline
CLI->>CLI: cmd_baseline(args)
CLI->>Baseline: run_baseline(csv_path=args.csv, target_col=args.target)
Baseline->>Baseline: load_csv_export(csv_path)
Baseline->>Features: build_feature_frame(df, target_col)
Features-->>Baseline: X, y
Baseline->>Model: train_baseline(df, target_col)
Model-->>Baseline: BaselineResult
Baseline-->>CLI: BaselineResult
CLI-->>User: print result.summary()
Class diagram for predictive baseline, features, and CLI modulesclassDiagram
class baseline {
+run_baseline(csv_path, target_col)
+run_diagnostic(csv_path, targets)
+load_csv_export(csv_path)
+train_baseline(df, target_col)
}
class BaselineResult {
+float train_r2
+float test_r2
+float test_rmse
+float test_mae
+float target_mean
+float target_std
+int n_train
+int n_val
+int n_test
+feature_importance
}
class features {
+build_feature_frame(df, target_col, lag_days, require_complete_rows)
+build_temporal_features(df)
+build_screentime_features(df, lag_days)
+build_substance_features(df, lag_days)
+build_autoregressive_features(df, target_col, lags)
}
class cli {
+main(argv)
+cmd_baseline(args)
+cmd_diagnostic(args)
+cmd_features(args)
}
baseline --> BaselineResult : returns
baseline --> features : uses
features --> BaselineResult : populates feature_importance
cli --> baseline : imports_run_baseline
cli --> baseline : imports_run_diagnostic
cli --> baseline : imports_load_csv_export
cli --> features : imports_build_feature_frame
Flow diagram for multi-target diagnostic with cross-target feature importanceflowchart TD
A[start run_diagnostic] --> B[load_csv_export csv_path]
B --> C{targets is None?}
C -- Yes --> D[select candidates time:* columns]
D --> E[filter by std > 0.1 and mean > 0.05]
E --> F[sort by descending mean]
F --> G[keep top 8 targets]
C -- No --> H[use provided targets]
G --> I[initialize results dict]
H --> I[initialize results dict]
I --> J[print diagnostic header]
J --> K{for each target in targets}
K --> L[try train_baseline df, target]
L --> M[store BaselineResult in results]
M --> N[print per-target metrics]
N --> K
K --> O[except Exception]
O --> P[logger warning Failed for target]
P --> K
K --> Q{results not empty?}
Q -- No --> R[return results]
Q -- Yes --> S[initialize all_importances dict]
S --> T[for each BaselineResult in results]
T --> U[for top 10 features in feature_importance]
U --> V[accumulate importance per feature]
V --> T
T --> W[sort features by total importance]
W --> X[print top 15 features with aggregated importance]
X --> R[return results]
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've found 2 issues, and left some high level feedback:
- In
run_diagnostic, consider avoiding a blanketexcept Exception(or at least logging the full stack trace) so that unexpected errors are easier to debug rather than being silently swallowed per target. - In
build_autoregressive_features, returning an empty DataFrame whentarget_colis missing may hide configuration issues; it could be safer to raise or log a warning so the caller knows AR features were not created. - For the CLI
--targetsargument, it may be helpful to strip whitespace around comma-separated target names (args.targets.split(',')) to be more forgiving of user input like"time:Work, time:Media".
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In `run_diagnostic`, consider avoiding a blanket `except Exception` (or at least logging the full stack trace) so that unexpected errors are easier to debug rather than being silently swallowed per target.
- In `build_autoregressive_features`, returning an empty DataFrame when `target_col` is missing may hide configuration issues; it could be safer to raise or log a warning so the caller knows AR features were not created.
- For the CLI `--targets` argument, it may be helpful to strip whitespace around comma-separated target names (`args.targets.split(',')`) to be more forgiving of user input like `"time:Work, time:Media"`.
## Individual Comments
### Comment 1
<location path="src/quantifiedme/predict/cli.py" line_range="26" />
<code_context>
+ """Run multi-target diagnostic comparison."""
+ from .baseline import run_diagnostic
+
+ targets = args.targets.split(",") if args.targets else None
+ run_diagnostic(args.csv, targets=targets)
+
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Strip whitespace from comma-separated targets to avoid mismatches with column names.
If a user passes `--targets` with spaces (e.g. `"time:Work, time:Play"`), entries after the comma will include a leading space and likely won’t match any column. Consider normalizing with:
```python
targets = [t.strip() for t in args.targets.split(',')] if args.targets else None
```
Suggested implementation:
```python
def cmd_diagnostic(args: argparse.Namespace) -> None:
"""Run multi-target diagnostic comparison."""
from .baseline import run_diagnostic
```
```python
targets = [t.strip() for t in args.targets.split(",")] if args.targets else None
run_diagnostic(args.csv, targets=targets)
```
</issue_to_address>
### Comment 2
<location path="src/quantifiedme/predict/baseline.py" line_range="239-240" />
<code_context>
+ f"{result.target_mean:>8.2f} {result.target_std:>8.2f} "
+ f"{result.n_train + result.n_val + result.n_test:>6}"
+ )
+ except Exception as e:
+ logger.warning(f"Failed for {target}: {e}")
+
+ # Summary: which features appear most across targets
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Avoid a broad `Exception` catch or at least surface more debugging information.
This broad `Exception` catch can mask programming errors (e.g., incorrect `BaselineResult` attributes) as mere warnings. Please either catch only the expected runtime errors (e.g., data/shape issues) or re-raise after logging unexpected ones. If you keep this pattern, at least log the traceback (`exc_info=True`) so failures remain diagnosable.
```suggestion
except Exception:
logger.warning("Failed for %s", target, exc_info=True)
```
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| """Run multi-target diagnostic comparison.""" | ||
| from .baseline import run_diagnostic | ||
|
|
||
| targets = args.targets.split(",") if args.targets else None |
There was a problem hiding this comment.
suggestion (bug_risk): Strip whitespace from comma-separated targets to avoid mismatches with column names.
If a user passes --targets with spaces (e.g. "time:Work, time:Play"), entries after the comma will include a leading space and likely won’t match any column. Consider normalizing with:
targets = [t.strip() for t in args.targets.split(',')] if args.targets else NoneSuggested implementation:
def cmd_diagnostic(args: argparse.Namespace) -> None:
"""Run multi-target diagnostic comparison."""
from .baseline import run_diagnostic targets = [t.strip() for t in args.targets.split(",")] if args.targets else None
run_diagnostic(args.csv, targets=targets)There was a problem hiding this comment.
Already addressed in prior commit — line 31 uses t.strip().
| except Exception as e: | ||
| logger.warning(f"Failed for {target}: {e}") |
There was a problem hiding this comment.
suggestion (bug_risk): Avoid a broad Exception catch or at least surface more debugging information.
This broad Exception catch can mask programming errors (e.g., incorrect BaselineResult attributes) as mere warnings. Please either catch only the expected runtime errors (e.g., data/shape issues) or re-raise after logging unexpected ones. If you keep this pattern, at least log the traceback (exc_info=True) so failures remain diagnosable.
| except Exception as e: | |
| logger.warning(f"Failed for {target}: {e}") | |
| except Exception: | |
| logger.warning("Failed for %s", target, exc_info=True) |
There was a problem hiding this comment.
Narrowed to (ValueError, KeyError) in 25f2b79. Kept exc_info=True for traceability.
Greptile SummaryThis PR extends the predictive QS framework with autoregressive features for the target variable, a multi-target Confidence Score: 5/5Safe to merge; all findings are P2 quality/style suggestions with no correctness or data-integrity impact. No P0 or P1 issues found. The CLI docstring, duplicate feature redundancy, missing No files require special attention for merge safety. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[Raw DataFrame\ntime:* + tag:* columns] --> B[build_substance_features\ndecay kernels, trailing counts]
A --> C[build_screentime_features\nlag d-1/2/3/7 + roll 7d mean/std]
A --> D[build_temporal_features\ndow, weekend, month, cyclical]
A --> E[build_autoregressive_features\ntarget lags d-1/2/3/7\nroll 7d/14d mean, 7d std]
B --> F[pd.concat → X]
C --> F
D --> F
E --> F
A --> G["y = target.shift(-1)\nnext-day value"]
F --> H[valid mask\ndrop NaN rows]
G --> H
H --> I[X_train / X_val / X_test\ntime-based 60/20/20 split]
I --> J[LightGBM train\nearly stopping on val set]
J --> K[BaselineResult\nmetrics + feature importance]
style E fill:#d4edda,stroke:#28a745
style K fill:#cce5ff,stroke:#004085
|
| Usage: | ||
| python -m quantifiedme.predict baseline data.csv | ||
| python -m quantifiedme.predict baseline data.csv --target time:Programming | ||
| python -m quantifiedme.predict diagnostic data.csv | ||
| python -m quantifiedme.predict features data.csv | ||
| """ |
There was a problem hiding this comment.
Incorrect module invocation in docstring
The docstring shows python -m quantifiedme.predict baseline data.csv, but there is no __main__.py in the predict package. Running that command will raise No module named quantifiedme.predict.__main__. The working command is python -m quantifiedme.predict.cli baseline data.csv (as the PR description correctly states).
Either fix the docstring or add a __main__.py that delegates to cli.main().
| Usage: | |
| python -m quantifiedme.predict baseline data.csv | |
| python -m quantifiedme.predict baseline data.csv --target time:Programming | |
| python -m quantifiedme.predict diagnostic data.csv | |
| python -m quantifiedme.predict features data.csv | |
| """ | |
| """CLI for the predictive QS framework. | |
| Usage: | |
| python -m quantifiedme.predict.cli baseline data.csv | |
| python -m quantifiedme.predict.cli baseline data.csv --target time:Programming | |
| python -m quantifiedme.predict.cli diagnostic data.csv | |
| python -m quantifiedme.predict.cli features data.csv | |
| """ |
There was a problem hiding this comment.
__main__.py already exists — python -m quantifiedme.predict works correctly. No change needed.
| ) | ||
| screentime_features = build_screentime_features(df, lag_days=lag_days) | ||
| temporal_features = build_temporal_features(df) | ||
| ar_features = build_autoregressive_features(df, target_col) |
There was a problem hiding this comment.
lag_days not forwarded to AR feature builder
build_feature_frame accepts a lag_days parameter and passes it to build_screentime_features, but calls build_autoregressive_features(df, target_col) without it. If a caller passes a custom lag_days (e.g. [1, 14, 30]), the AR lags will silently remain at the default [1, 2, 3, 7], producing an inconsistency between the two lag feature sets.
| ar_features = build_autoregressive_features(df, target_col) | |
| ar_features = build_autoregressive_features(df, target_col, lags=lag_days) |
There was a problem hiding this comment.
Good catch — fixed in 25f2b79. lag_days now forwarded to build_autoregressive_features.
| except Exception as e: | ||
| logger.warning(f"Failed for {target}: {e}") |
There was a problem hiding this comment.
Broad exception swallowing in diagnostic loop
Catching all Exception types and logging a warning means failures like a missing target column, a pandas shape mismatch, or a LightGBM error will be silently skipped. Consider narrowing the catch or at least re-raising unexpected errors so data pipeline problems don't go undetected during a diagnostic run.
| except Exception as e: | |
| logger.warning(f"Failed for {target}: {e}") | |
| except (ValueError, KeyError) as e: | |
| logger.warning(f"Failed for {target}: {e}") |
There was a problem hiding this comment.
Narrowed to (ValueError, KeyError) in 25f2b79.
PyMC-based linear model with regularized priors for predicting next-day work hours from substance decay kernels, temporal patterns, and autoregressive features. Outputs posterior predictive distributions with credible intervals rather than point estimates. - models/work.py: Bayesian model with feature selection, standardization, manual out-of-sample posterior predictive computation - cli.py: `bayesian` subcommand with --samples, --tune, --max-features - __main__.py: enable `python -m quantifiedme.predict` execution Initial results on 730-day AW+QSlang dataset: - time:Work R²=0.057 (test), 94% CI coverage=91.7% - Top predictors: day-of-week (cyclical), psychedelics:count_7d, Programming:7d_std, alcohol:kernel (negative)
- Log full stack trace in diagnostic exception handler (was swallowing) - Warn when AR target column is missing instead of silent empty return - Strip whitespace from comma-separated --targets input
Implements simulate.py with intervention mapping that translates human-readable substance additions/removals into feature vector modifications for the Bayesian model. Adds CLI command: python -m quantifiedme.predict simulate data.csv --add caffeine python -m quantifiedme.predict simulate data.csv --remove alcohol --add nicotine Uses query_intervention() from the existing work model to compare baseline vs intervention posterior predictive distributions, reporting effect size, credible intervals, and P(positive effect).
Cast pandas .values (which returns ndarray | ExtensionArray) to np.ndarray via np.asarray() for type-safe numpy operations. Also fix Hashable → str index type for feature importance dict.
When target_col is in key_categories (e.g. time:Work), both build_screentime_features and build_autoregressive_features produced identical lag/rolling columns under different names, splitting feature importance and misleading the model. Now build_screentime_features excludes the target column, letting AR features (which include additional 14d_mean) handle it exclusively. Addresses Greptile P2 review feedback on PR ErikBjare#23.
- Forward lag_days to build_autoregressive_features (Greptile P2) - Narrow diagnostic exception catch to ValueError/KeyError (Sourcery + Greptile P2)
Summary
Phase 1 improvements to the predictive QS framework (building on #22):
run_diagnostic()): Compare baseline performance across all viable screen time categories in one run, with cross-target feature importance aggregationpython -m quantifiedme.predict.cli): Three subcommands —baseline,diagnostic,features— for convenient explorationResults (730 days, 88 features)
Top cross-target features: day-of-week (dominant), cannabinoids trailing count, AR 7d_std, social media lag, programming volatility.
Key insight: With only daily binary substance data (no dosages/timestamps), substance kernels provide modest signal. The model's weakness is expected — this baseline establishes the floor for comparison with the Bayesian model in Phase 2.
Test plan
Summary by Sourcery
Add multi-target diagnostics, autoregressive features, and a command-line interface to the predictive baseline framework.
New Features:
Enhancements: