Address residual P1+P2s from re-audit of PR #412 by igerber · Pull Request #422 · igerber/diff-diff

igerber · 2026-05-13T00:14:37Z

Summary

Audit follow-up to PR #412. The restored CI reviewer surfaced findings the degraded reviewer missed across all 5 prior rounds:

P1 (REGISTRY + code comment) - the claim that "R does not ship per-path `predict_het` on placebos either, so parity is preserved by deferral" contradicts what R's `did_multiplegt_dyn(..., by_path, predict_het)` dispatcher does: it forwards `predict_het` into each per-path `did_multiplegt_main` call alongside `placebo`, so R may emit per-path placebo heterogeneity rows we do not yet mirror. Rewrite both surfaces as an explicit Python-side deferral, NOT a verified R-parity. TODO row added to track validating R's actual output and either implementing parity or documenting the deviation explicitly.
P2 (REGISTRY rtol claim) - the per-path heterogeneity R-parity paragraph claimed `rtol ~1e-6 on point estimates AND SE`, but parity tests use `BETA_RTOL=1e-6` and `SE_RTOL=1e-5`. Split the claim and note the WLS-denominator/cohort-recentering numerical drift motivating the looser SE bound.
P2 (replicate-weight df_survey refresh test gap) - the existing test `test_per_path_heterogeneity_replicate_weights_propagates_n_valid` would have passed if the new dedicated refresh loop failed to recompute `t_stat` / `p_value` / `conf_int` at the final `df_survey`. Strengthen to call `safe_inference(beta, se, df=df_survey)` on the first finite entry and assert the stored inference fields match.
P2 (paths_of_interest survey gap) - the documented composability of `paths_of_interest + heterogeneity + survey_design` was not regression-locked (all survey-specific tests used `by_path=k`). Add two new tests: analytical Binder TSL coverage with selector-ordering preservation, and the multiplier-bootstrap gate under `paths_of_interest`.

No estimator behavior, weighting, variance/SE, identification, or default statistical surface changed in source - documentation accuracy plus expanded regression coverage only.

Test plan

CI - new tests run as part of the standard slow-test matrix.
Locally verified: `pytest tests/test_chaisemartin_dhaultfoeuille.py::TestByPathHeterogeneity::test_paths_of_interest_heterogeneity_{survey_design_analytical,survey_n_bootstrap_gate} tests/test_chaisemartin_dhaultfoeuille.py::TestByPathHeterogeneity::test_per_path_heterogeneity_replicate_weights_propagates_n_valid -m ''` all pass.

🤖 Generated with Claude Code

The restored CI reviewer surfaced findings the degraded reviewer missed across all 5 prior rounds on PR #412: P1 (REGISTRY + code comment): the claim that "R does not ship per-path predict_het on placebos either, so parity is preserved by deferral" contradicts what R's `did_multiplegt_dyn(..., by_path, predict_het)` dispatcher actually does - it forwards `predict_het` into each per-path `did_multiplegt_main` call along with `placebo`, so R may emit per-path placebo heterogeneity rows we do not yet mirror. Rewrite both surfaces (chaisemartin_dhaultfoeuille_results.py code comment and REGISTRY.md DataFrame-integration paragraph) as an explicit Python- side deferral rather than a verified R-parity. Add a TODO row to track validating R's actual placebo predict_het output and either implementing parity or documenting the deviation explicitly. P2 (REGISTRY rtol claim): the per-path heterogeneity R-parity paragraph claimed "rtol ~1e-6 on point estimates AND SE", but the parity tests use BETA_RTOL=1e-6 and SE_RTOL=1e-5 (one decade looser on SE). Split the claim into the two separate tolerances and note the WLS-denominator/cohort-recentering numerical drift that motivates the looser SE bound. P2 (replicate-weight df_survey refresh): the existing test only checked finite SE; it would have passed if the new dedicated heterogeneity refresh loop failed to recompute t_stat / p_value / conf_int at the final df_survey. Strengthen the test to call `safe_inference(beta, se, df=df_survey)` on the first finite entry and assert the stored inference fields match - this anti-regression covers the dedicated post-call refresh added for path_heterogeneity_ effects. P2 (paths_of_interest survey gap): the documented composability of `paths_of_interest + heterogeneity + survey_design` was not regression- locked - all existing survey-specific tests used `by_path=k`. Add test_paths_of_interest_heterogeneity_survey_design_analytical (verify analytical Binder TSL fits, selector ordering preserved, finite SE per populated (path, l)) and test_paths_of_interest_heterogeneity_ survey_n_bootstrap_gate (verify the multiplier-bootstrap gate applies under paths_of_interest too). No estimator behavior, weighting, variance/SE, identification, or default statistical surface changed in source - documentation accuracy plus expanded regression coverage only. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-13T00:19:36Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 7e9f1aa861fd043ea6b276b57c7ec4363d1e5bbe

Overall Assessment

✅ Looks good — no unmitigated P0/P1 findings.

Executive Summary

The diff does not change dCDH estimator logic, weighting, variance/SE computation, identification assumptions, or defaults; it changes documentation/comments and adds regression coverage around by_path heterogeneity.
Affected methodology surface: dCDH per-path heterogeneity testing (predict_het, Web Appendix Section 1.5 / Lemma 7) and the by-path placebo-row export path.
The registry/comment updates correctly stop claiming verified R parity for per-path placebo heterogeneity and instead document the current Python behavior as a tracked deferral. docs/methodology/REGISTRY.md:L643-L645, diff_diff/chaisemartin_dhaultfoeuille_results.py:L1862-L1903, TODO.md:L63
The rewritten parity-tolerance language matches the actual parity test constants (BETA_RTOL=1e-6, SE_RTOL=1e-5). tests/test_chaisemartin_dhaultfoeuille_parity.py:L1404-L1469
The new tests target the right regressions: final df_survey refresh on path_heterogeneity_effects, paths_of_interest + heterogeneity + survey_design, and the survey + multiplier-bootstrap gate under paths_of_interest. tests/test_chaisemartin_dhaultfoeuille.py:L10577-L10703
I could not execute the tests in this environment because pytest is not installed.

Methodology

Severity: P3. Impact: The remaining per-path placebo heterogeneity gap is now accurately documented as a Python-side deferral rather than an asserted R-parity fact; this matches the live behavior, where placebo rows in to_dataframe(level="by_path") still emit NaN het_* fields. Concrete fix: None required for approval; future parity work is already tracked. docs/methodology/REGISTRY.md:L643-L645, diff_diff/chaisemartin_dhaultfoeuille_results.py:L1862-L1903, TODO.md:L63
No P0/P1 methodology findings.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

Severity: P3. Impact: The new TODO entry properly tracks the unresolved placebo-heterogeneity parity work, so it is mitigated under the project’s deferred-work policy rather than remaining implicit. Concrete fix: None required in this PR. TODO.md:L63

Security

No findings.

Documentation/Tests

No findings. The added tests are well-targeted to the touched risks, especially the dedicated final-df refresh loop in fit(). diff_diff/chaisemartin_dhaultfoeuille.py:L3957-L4070, tests/test_chaisemartin_dhaultfoeuille.py:L10577-L10703
Verification note: test execution was not possible here because neither pytest nor python -m pytest is available in the review environment.

igerber and others added 2 commits May 12, 2026 20:14

Fix PR reference in deferred-placebo-het TODO row (#421 -> #422)

7e9f1aa

igerber added the ready-for-ci Triggers CI test workflows label May 13, 2026

igerber merged commit 9a2435a into main May 13, 2026
31 of 32 checks passed

igerber deleted the fix-audit-412 branch May 13, 2026 10:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Address residual P1+P2s from re-audit of PR #412#422

Address residual P1+P2s from re-audit of PR #412#422
igerber merged 2 commits into
mainfrom
fix-audit-412

igerber commented May 13, 2026

Uh oh!

github-actions Bot commented May 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

igerber commented May 13, 2026

Summary

Test plan

Uh oh!

github-actions Bot commented May 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant