Skip to content

Add bank-invariant solver experiment with differential features#5

Open
Copilot wants to merge 2 commits intomainfrom
copilot/work-on-solver-challenges
Open

Add bank-invariant solver experiment with differential features#5
Copilot wants to merge 2 commits intomainfrom
copilot/work-on-solver-challenges

Conversation

Copy link
Copy Markdown

Copilot AI commented Mar 25, 2026

The bank-adaptive solver cleared holdout but failed fresh-bank confirmation because its ridge chooser uses absolute score features (support_score, joint_score, etc.) that shift with the reference bank, breaking frozen routing on new banks.

This experiment replaces absolute features with bank-invariant differentials to test whether removing the bank-dependent baseline from the feature vector is sufficient for confirmation stability.

Design

  • Differential features only: score_diff, entropy_diff, cv_score_diff, log_alpha_diff, t_diff, rho_diff, h_diff, w1_diff, w2_diff — all joint-minus-support. First-order invariant to additive bank shifts.
  • Cell one-hot retained for support-type conditioning (6 cells: 2 conditions × 3 skew bins).
  • Same evaluation ladder as bank-adaptive: disjoint calibration → frozen chooser → holdout → confirmation. One density fallback branch.

Results (baseline, bank_size=300)

Split Support Joint Chooser
Calibration 0.1862 0.1596 0.1403 beats both
Holdout 0.1273 0.1180 0.1152 beats both
Confirmation 0.1319 0.1773 0.1674 beats joint, loses to support

The bank-invariant baseline clears holdout (the bank-adaptive baseline could not — it lost to joint at holdout). Confirmation still fails. The differential redesign reduces bank sensitivity but does not eliminate it.

Files

  • experiments/pose-anisotropy-interventions/bank-invariant-solver/run.py — experiment driver
  • experiments/pose-anisotropy-interventions/bank-invariant-solver/README.md — writeup with results
  • experiments/pose-anisotropy-interventions/bank-invariant-solver/outputs/ — cache, models, reports
  • experiments/README.md — index updated with new experiment link

@zfifteen zfifteen marked this pull request as ready for review March 25, 2026 08:24
Copilot AI review requested due to automatic review settings March 25, 2026 08:24
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new “bank-invariant solver” experiment to the pose-anisotropy interventions suite to test whether routing stability across fresh banks improves when the ridge chooser uses joint-minus-support differential features instead of absolute, bank-shifting score features.

Changes:

  • Introduces a new experiment driver (run.py) implementing differential-feature ridge routing and the same evaluation ladder (calibration → frozen holdout → confirmation, with one fallback branch).
  • Adds an experiment write-up (README.md) and links it from the top-level experiments index.
  • Commits generated artifacts (cache tables, frozen model JSON, and report CSV/JSON summaries) under outputs/.

Reviewed changes

Copilot reviewed 20 out of 20 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
experiments/pose-anisotropy-interventions/bank-invariant-solver/run.py New experiment driver; builds differential feature vectors and fits/evaluates a frozen ridge chooser.
experiments/pose-anisotropy-interventions/bank-invariant-solver/README.md Documents motivation, feature set, evaluation ladder, and baseline results for the new experiment.
experiments/README.md Adds index links for the bank-adaptive and bank-invariant solver experiments.
experiments/pose-anisotropy-interventions/bank-invariant-solver/outputs/models/baseline__frozen_ridge_chooser.json Stored frozen ridge chooser artifact for the baseline variant.
experiments/pose-anisotropy-interventions/bank-invariant-solver/outputs/reports/baseline__calibration_predictions.csv Calibration-set per-trial predictions from the frozen chooser.
experiments/pose-anisotropy-interventions/bank-invariant-solver/outputs/reports/baseline__confirmation_block_predictions.csv Confirmation-block per-trial predictions from the frozen chooser.
experiments/pose-anisotropy-interventions/bank-invariant-solver/outputs/reports/baseline__fit_eval_summary.json Summary metrics for calibration/holdout (and confirmation if present).
experiments/pose-anisotropy-interventions/bank-invariant-solver/outputs/reports/baseline__full_plan_result.json Full-plan result payload including final interpretation string.
experiments/pose-anisotropy-interventions/bank-invariant-solver/outputs/reports/baseline__holdout_block_1_predictions.csv Holdout-block per-trial predictions from the frozen chooser.
experiments/pose-anisotropy-interventions/bank-invariant-solver/outputs/reports/baseline__ladder_summary.json Ladder summary across smoke/calibration/holdout/confirmation.
experiments/pose-anisotropy-interventions/bank-invariant-solver/outputs/reports/baseline__smoke_calibration_block_1_predictions.csv Smoke-check predictions on calibration block 1.
experiments/pose-anisotropy-interventions/bank-invariant-solver/outputs/reports/baseline__smoke_calibration_block_1_summary.json Smoke-check summary metrics on calibration block 1.
experiments/pose-anisotropy-interventions/bank-invariant-solver/outputs/cache/baseline__calibration_block_1.csv Cached trial table for calibration block 1.
experiments/pose-anisotropy-interventions/bank-invariant-solver/outputs/cache/baseline__calibration_block_1.json Cache metadata + per-cell/per-condition summary for calibration block 1.
experiments/pose-anisotropy-interventions/bank-invariant-solver/outputs/cache/baseline__calibration_block_2.csv Cached trial table for calibration block 2.
experiments/pose-anisotropy-interventions/bank-invariant-solver/outputs/cache/baseline__calibration_block_2.json Cache metadata + per-cell/per-condition summary for calibration block 2.
experiments/pose-anisotropy-interventions/bank-invariant-solver/outputs/cache/baseline__holdout_block_1.csv Cached trial table for holdout block 1.
experiments/pose-anisotropy-interventions/bank-invariant-solver/outputs/cache/baseline__holdout_block_1.json Cache metadata + per-cell/per-condition summary for holdout block 1.
experiments/pose-anisotropy-interventions/bank-invariant-solver/outputs/cache/baseline__confirmation_block.csv Cached trial table for confirmation block.
experiments/pose-anisotropy-interventions/bank-invariant-solver/outputs/cache/baseline__confirmation_block.json Cache metadata + per-cell/per-condition summary for confirmation block.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +44 to +47
| `cv_score_diff` | `joint_cv − support_cv` | First-order invariant |
| `abs_log_alpha_diff` | `|log(joint_α) − log(support_α)|` | Magnitude of disagreement |
| `abs_t_diff` | `|joint_t − support_t|` | Magnitude of disagreement |
| `score_sign` | `sign(joint_score − support_score)` | Rank-based (fully invariant) |
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the feature table, cv_score_diff is defined as joint_cv − support_cv and score_sign as sign(joint_score − support_score), but the code uses the *_cv_score fields and implements score_sign as +1 when joint_score < support_score (ties go to -1). Please update these definitions to match the implementation (or change the implementation and regenerate outputs/models) so the README matches what was actually trained/evaluated.

Suggested change
| `cv_score_diff` | `joint_cvsupport_cv` | First-order invariant |
| `abs_log_alpha_diff` | `|log(joint_α) − log(support_α)|` | Magnitude of disagreement |
| `abs_t_diff` | `|joint_t − support_t|` | Magnitude of disagreement |
| `score_sign` | `sign(joint_score − support_score)` | Rank-based (fully invariant) |
| `cv_score_diff` | `joint_cv_scoresupport_cv_score` | First-order invariant |
| `abs_log_alpha_diff` | `|log(joint_α) − log(support_α)|` | Magnitude of disagreement |
| `abs_t_diff` | `|joint_t − support_t|` | Magnitude of disagreement |
| `score_sign` | `+1 if support_cv_score > joint_cv_score else −1` (ties → −1) | Rank-based (fully invariant) |

Copilot uses AI. Check for mistakes.
Comment on lines +10 to +19
differentials and ratios. Differentials between support and joint candidates
are first-order invariant to additive bank-induced shifts, and ratios are
first-order invariant to multiplicative shifts. The routing decision only
needs to know which candidate is better, not the absolute quality level,
so differential features retain all relevant information.

Key design change vs the bank-adaptive solver:

- Feature vector uses only candidate differentials, not absolute scores.
The cell one-hot encoding is retained for support-type conditioning.
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The module docstring claims the feature set includes “ratios” and is invariant to multiplicative shifts, but the implemented feature vector only uses differentials/abs-diffs plus a sign feature (no ratio features are constructed). This makes the experiment description misleading—either remove the ratio/multiplicative-shift claims or add the corresponding ratio features and include them in FEATURE_NAMES/build_feature_vector.

Suggested change
differentials and ratios. Differentials between support and joint candidates
are first-order invariant to additive bank-induced shifts, and ratios are
first-order invariant to multiplicative shifts. The routing decision only
needs to know which candidate is better, not the absolute quality level,
so differential features retain all relevant information.
Key design change vs the bank-adaptive solver:
- Feature vector uses only candidate differentials, not absolute scores.
The cell one-hot encoding is retained for support-type conditioning.
differentials. Differentials between support and joint candidates are
first-order invariant to additive bank-induced shifts. The routing decision
only needs to know which candidate is better, not the absolute quality level,
so differential features retain all relevant information.
Key design change vs the bank-adaptive solver:
- Feature vector uses only candidate differentials (plus any derived sign
indicators), not absolute scores. The cell one-hot encoding is retained
for support-type conditioning.

Copilot uses AI. Check for mistakes.
cv_diff,
abs(log_alpha_diff),
abs(t_diff),
1.0 if score_diff < 0.0 else -1.0,
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The score_sign feature is documented as sign(joint_score − support_score), but the implementation returns +1 when joint_score < support_score and -1 otherwise (i.e., it’s closer to sign(support_score − joint_score) and also maps ties to -1). This mismatch makes the feature semantics hard to reason about and can lead to incorrect interpretation of learned weights; please align the computation with the documented definition (including a tie behavior), or update the docs/feature name to reflect the actual meaning.

Suggested change
1.0 if score_diff < 0.0 else -1.0,
1.0 if score_diff > 0.0 else (-1.0 if score_diff < 0.0 else 0.0),

Copilot uses AI. Check for mistakes.
Comment on lines +22 to +24
features (support minus joint, or joint minus support). Differentials are
first-order invariant to additive bank-induced shifts: when the bank changes,
both candidates' scores shift together, but their difference is preserved.
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section says the differential features can be “support minus joint, or joint minus support”, but the actual implementation (and later table) consistently uses joint − support. Consider tightening the wording to a single convention so readers don’t invert feature signs when comparing against the model artifact/results.

Suggested change
features (support minus joint, or joint minus support). Differentials are
first-order invariant to additive bank-induced shifts: when the bank changes,
both candidates' scores shift together, but their difference is preserved.
features, consistently defined as `joint − support` (joint candidate metric
minus support candidate metric). These differentials are first-order
invariant to additive bank-induced shifts: when the bank changes, both
candidates' scores shift together, but their difference is preserved.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants