Add bank-invariant solver experiment with differential features#5
Add bank-invariant solver experiment with differential features#5
Conversation
Co-authored-by: zfifteen <221906715+zfifteen@users.noreply.github.com> Agent-Logs-Url: https://github.com/zfifteen/shape-budget/sessions/36dc4936-e048-421a-a7c5-b32763e92cf7
…results Co-authored-by: zfifteen <221906715+zfifteen@users.noreply.github.com> Agent-Logs-Url: https://github.com/zfifteen/shape-budget/sessions/fac03558-583b-4ac6-ab01-23010a56fc36
There was a problem hiding this comment.
Pull request overview
Adds a new “bank-invariant solver” experiment to the pose-anisotropy interventions suite to test whether routing stability across fresh banks improves when the ridge chooser uses joint-minus-support differential features instead of absolute, bank-shifting score features.
Changes:
- Introduces a new experiment driver (
run.py) implementing differential-feature ridge routing and the same evaluation ladder (calibration → frozen holdout → confirmation, with one fallback branch). - Adds an experiment write-up (
README.md) and links it from the top-level experiments index. - Commits generated artifacts (cache tables, frozen model JSON, and report CSV/JSON summaries) under
outputs/.
Reviewed changes
Copilot reviewed 20 out of 20 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| experiments/pose-anisotropy-interventions/bank-invariant-solver/run.py | New experiment driver; builds differential feature vectors and fits/evaluates a frozen ridge chooser. |
| experiments/pose-anisotropy-interventions/bank-invariant-solver/README.md | Documents motivation, feature set, evaluation ladder, and baseline results for the new experiment. |
| experiments/README.md | Adds index links for the bank-adaptive and bank-invariant solver experiments. |
| experiments/pose-anisotropy-interventions/bank-invariant-solver/outputs/models/baseline__frozen_ridge_chooser.json | Stored frozen ridge chooser artifact for the baseline variant. |
| experiments/pose-anisotropy-interventions/bank-invariant-solver/outputs/reports/baseline__calibration_predictions.csv | Calibration-set per-trial predictions from the frozen chooser. |
| experiments/pose-anisotropy-interventions/bank-invariant-solver/outputs/reports/baseline__confirmation_block_predictions.csv | Confirmation-block per-trial predictions from the frozen chooser. |
| experiments/pose-anisotropy-interventions/bank-invariant-solver/outputs/reports/baseline__fit_eval_summary.json | Summary metrics for calibration/holdout (and confirmation if present). |
| experiments/pose-anisotropy-interventions/bank-invariant-solver/outputs/reports/baseline__full_plan_result.json | Full-plan result payload including final interpretation string. |
| experiments/pose-anisotropy-interventions/bank-invariant-solver/outputs/reports/baseline__holdout_block_1_predictions.csv | Holdout-block per-trial predictions from the frozen chooser. |
| experiments/pose-anisotropy-interventions/bank-invariant-solver/outputs/reports/baseline__ladder_summary.json | Ladder summary across smoke/calibration/holdout/confirmation. |
| experiments/pose-anisotropy-interventions/bank-invariant-solver/outputs/reports/baseline__smoke_calibration_block_1_predictions.csv | Smoke-check predictions on calibration block 1. |
| experiments/pose-anisotropy-interventions/bank-invariant-solver/outputs/reports/baseline__smoke_calibration_block_1_summary.json | Smoke-check summary metrics on calibration block 1. |
| experiments/pose-anisotropy-interventions/bank-invariant-solver/outputs/cache/baseline__calibration_block_1.csv | Cached trial table for calibration block 1. |
| experiments/pose-anisotropy-interventions/bank-invariant-solver/outputs/cache/baseline__calibration_block_1.json | Cache metadata + per-cell/per-condition summary for calibration block 1. |
| experiments/pose-anisotropy-interventions/bank-invariant-solver/outputs/cache/baseline__calibration_block_2.csv | Cached trial table for calibration block 2. |
| experiments/pose-anisotropy-interventions/bank-invariant-solver/outputs/cache/baseline__calibration_block_2.json | Cache metadata + per-cell/per-condition summary for calibration block 2. |
| experiments/pose-anisotropy-interventions/bank-invariant-solver/outputs/cache/baseline__holdout_block_1.csv | Cached trial table for holdout block 1. |
| experiments/pose-anisotropy-interventions/bank-invariant-solver/outputs/cache/baseline__holdout_block_1.json | Cache metadata + per-cell/per-condition summary for holdout block 1. |
| experiments/pose-anisotropy-interventions/bank-invariant-solver/outputs/cache/baseline__confirmation_block.csv | Cached trial table for confirmation block. |
| experiments/pose-anisotropy-interventions/bank-invariant-solver/outputs/cache/baseline__confirmation_block.json | Cache metadata + per-cell/per-condition summary for confirmation block. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| | `cv_score_diff` | `joint_cv − support_cv` | First-order invariant | | ||
| | `abs_log_alpha_diff` | `|log(joint_α) − log(support_α)|` | Magnitude of disagreement | | ||
| | `abs_t_diff` | `|joint_t − support_t|` | Magnitude of disagreement | | ||
| | `score_sign` | `sign(joint_score − support_score)` | Rank-based (fully invariant) | |
There was a problem hiding this comment.
In the feature table, cv_score_diff is defined as joint_cv − support_cv and score_sign as sign(joint_score − support_score), but the code uses the *_cv_score fields and implements score_sign as +1 when joint_score < support_score (ties go to -1). Please update these definitions to match the implementation (or change the implementation and regenerate outputs/models) so the README matches what was actually trained/evaluated.
| | `cv_score_diff` | `joint_cv − support_cv` | First-order invariant | | |
| | `abs_log_alpha_diff` | `|log(joint_α) − log(support_α)|` | Magnitude of disagreement | | |
| | `abs_t_diff` | `|joint_t − support_t|` | Magnitude of disagreement | | |
| | `score_sign` | `sign(joint_score − support_score)` | Rank-based (fully invariant) | | |
| | `cv_score_diff` | `joint_cv_score − support_cv_score` | First-order invariant | | |
| | `abs_log_alpha_diff` | `|log(joint_α) − log(support_α)|` | Magnitude of disagreement | | |
| | `abs_t_diff` | `|joint_t − support_t|` | Magnitude of disagreement | | |
| | `score_sign` | `+1 if support_cv_score > joint_cv_score else −1` (ties → −1) | Rank-based (fully invariant) | |
| differentials and ratios. Differentials between support and joint candidates | ||
| are first-order invariant to additive bank-induced shifts, and ratios are | ||
| first-order invariant to multiplicative shifts. The routing decision only | ||
| needs to know which candidate is better, not the absolute quality level, | ||
| so differential features retain all relevant information. | ||
|
|
||
| Key design change vs the bank-adaptive solver: | ||
|
|
||
| - Feature vector uses only candidate differentials, not absolute scores. | ||
| The cell one-hot encoding is retained for support-type conditioning. |
There was a problem hiding this comment.
The module docstring claims the feature set includes “ratios” and is invariant to multiplicative shifts, but the implemented feature vector only uses differentials/abs-diffs plus a sign feature (no ratio features are constructed). This makes the experiment description misleading—either remove the ratio/multiplicative-shift claims or add the corresponding ratio features and include them in FEATURE_NAMES/build_feature_vector.
| differentials and ratios. Differentials between support and joint candidates | |
| are first-order invariant to additive bank-induced shifts, and ratios are | |
| first-order invariant to multiplicative shifts. The routing decision only | |
| needs to know which candidate is better, not the absolute quality level, | |
| so differential features retain all relevant information. | |
| Key design change vs the bank-adaptive solver: | |
| - Feature vector uses only candidate differentials, not absolute scores. | |
| The cell one-hot encoding is retained for support-type conditioning. | |
| differentials. Differentials between support and joint candidates are | |
| first-order invariant to additive bank-induced shifts. The routing decision | |
| only needs to know which candidate is better, not the absolute quality level, | |
| so differential features retain all relevant information. | |
| Key design change vs the bank-adaptive solver: | |
| - Feature vector uses only candidate differentials (plus any derived sign | |
| indicators), not absolute scores. The cell one-hot encoding is retained | |
| for support-type conditioning. |
| cv_diff, | ||
| abs(log_alpha_diff), | ||
| abs(t_diff), | ||
| 1.0 if score_diff < 0.0 else -1.0, |
There was a problem hiding this comment.
The score_sign feature is documented as sign(joint_score − support_score), but the implementation returns +1 when joint_score < support_score and -1 otherwise (i.e., it’s closer to sign(support_score − joint_score) and also maps ties to -1). This mismatch makes the feature semantics hard to reason about and can lead to incorrect interpretation of learned weights; please align the computation with the documented definition (including a tie behavior), or update the docs/feature name to reflect the actual meaning.
| 1.0 if score_diff < 0.0 else -1.0, | |
| 1.0 if score_diff > 0.0 else (-1.0 if score_diff < 0.0 else 0.0), |
| features (support minus joint, or joint minus support). Differentials are | ||
| first-order invariant to additive bank-induced shifts: when the bank changes, | ||
| both candidates' scores shift together, but their difference is preserved. |
There was a problem hiding this comment.
This section says the differential features can be “support minus joint, or joint minus support”, but the actual implementation (and later table) consistently uses joint − support. Consider tightening the wording to a single convention so readers don’t invert feature signs when comparing against the model artifact/results.
| features (support minus joint, or joint minus support). Differentials are | |
| first-order invariant to additive bank-induced shifts: when the bank changes, | |
| both candidates' scores shift together, but their difference is preserved. | |
| features, consistently defined as `joint − support` (joint candidate metric | |
| minus support candidate metric). These differentials are first-order | |
| invariant to additive bank-induced shifts: when the bank changes, both | |
| candidates' scores shift together, but their difference is preserved. |
The bank-adaptive solver cleared holdout but failed fresh-bank confirmation because its ridge chooser uses absolute score features (
support_score,joint_score, etc.) that shift with the reference bank, breaking frozen routing on new banks.This experiment replaces absolute features with bank-invariant differentials to test whether removing the bank-dependent baseline from the feature vector is sufficient for confirmation stability.
Design
score_diff,entropy_diff,cv_score_diff,log_alpha_diff,t_diff,rho_diff,h_diff,w1_diff,w2_diff— all joint-minus-support. First-order invariant to additive bank shifts.Results (baseline, bank_size=300)
The bank-invariant baseline clears holdout (the bank-adaptive baseline could not — it lost to joint at holdout). Confirmation still fails. The differential redesign reduces bank sensitivity but does not eliminate it.
Files
experiments/pose-anisotropy-interventions/bank-invariant-solver/run.py— experiment driverexperiments/pose-anisotropy-interventions/bank-invariant-solver/README.md— writeup with resultsexperiments/pose-anisotropy-interventions/bank-invariant-solver/outputs/— cache, models, reportsexperiments/README.md— index updated with new experiment link